Howdy list, Norman,
I have a (real world) scenario around those lines, probably a subset / special case of the conversation so far. Discussing it here will help me wrap my head around this aspect of Genode I think:
Let's say you're writing some high-reliability software (take that with a heavy pinch of salt: this is not about airplane embedded software or NPP programs, I'm thinking of humble Radio Automation Software!).
In the radio industry, the 'cardinal sin' is going silent. You don't want your software to stop broadcasting music and go dead for a prolonged time (though crashing and being auto-restarted in a few seconds is tolerated to some degree). Especially if it happens at night when nobody's in the studio to "kick" the computer back into life until the next morning.
If there are no crashing bugs, no automation bugs, and no memory leaks, you're golden -- the above wish is realized.
Focusing on the topic at hand though: what if there is a memory leak that, in Genode's case, ends up as a blocking "resource request", whereby the broadcast app stays there doing nothing, and not being restarted clean from a blank slate ?
Software written against ffmpeg or libav and the like tends to be fussy, and be hard to get 100% right memry wise (ask me how I know *g*). Hopefully that won't be the case on Genode, but let's say it is, for the sake of argument. Let's imagine my app's memory usage creeps up continuously by several MBs per day, such that it eventually overwhelms its assigned memory quota, no matter how generous, after a sufficient amount of time has elapsed.
What is the "Genode doctrine" for dealing with that, for writing highly reliable code ?
To get the discussion started, I'll come up with a few ideas.. Not sure which of these are already implemented/possible in Genode; In my ongoing exploration of the source I've noticed something called "sequence" (?) (I think it's in repos/ports or something) that allows to repeatedly launch a component after it quits, maybe that would help. Anyway here goes:
- init/core (is configured to) kill the child as soon as it oversteps its RAM quota (or capability quota); then restart it, either in lock-step or by another component (sequence?), in order to keep the "init" component as simple as possible.
- the program itself detects that it has gone over-quota, and commits "suicide", then gets restarted (by e.g. "sequence" mentionned above)
- the program implements a hook (callback) for handling quota going over-bound, which simply calls exit(), and "sequence" handles the restarting.
- "sequence" itself is the direct parent (instead of init) of the radio app, and thus is the recipient of ram quota requests, and handles them differently from init: it kills-restarts the app when receiving such a request.
- "sequence" is the direct parent, and monitors the ram usage of its child (say, once per second), because the above method does not work somehow
- other ideas ?
Cedric