Hi Cedric,
great that you are chiming in. Your posting made me realize that I have missed to present a key piece of the puzzle in my reply to Alexander.
In my reply, I emphasized Genode's mantra of decoupling (often complex and potentially flaky) functionality from the policy of what happens when things go wrong. The functionality is the job of a "child" whereas the policy should be in the hands of the "parent". But I left open how the "parent" looks like in practice. You stumbled over the sequence tool, which is actually a good example for a parent. But it is not a good fit for the problem at hand.
The appropriate solution is the use of a dynamically configured init component (let's call it "dynamic init") in tandem with a management component that (1) monitors the state of the dynamic init and its children, and (2) feeds the dynamic init with configurations.
Child ^ | 'state' report Dynamic -------------------> Manager Init <------------------- 'config' ROM
Both the dynamic init and the management component are siblings within another (e.g., the static initial) init instance. The 'state' report and the 'config' ROM are propagated via the report_rom component.
(1) Init supports the reporting of its current state including the state of all children. See [1] for an overview of the reporting options. E.g., in Sculpt, you can have a look at the reported state of the runtime subsystem by looking at /report/runtime/state. The report captures - among other things - the resource consumption of each child. Should a child overstep its resource boundaries, the respective <child> node turns into this
<state> .. <child name="noux-system" ...> <ram ... requested="2M"/> .. </child> </state>
If the 'requested' attribute is present, the child got stuck in a resource request. In the example above, the "noux-system" asks for 2M of additional memory.
[1] https://genode.org/documentation/genode-foundations/19.05/system_configurati...
To resolve this situation, the manager can generate a new configuration for the dynamic init. In particular, it can
- Adjust the resource quota of the resource-starved child. When the dynamic init observes such a configuration change, it answers the resource request and triggers the child to continue.
- Restart the child by incrementing a 'version' attribute of the child node. Once the dynamic init observes a change of this attribute, the child is killed and restarted.
In addition to responding to resource requests, the manager component can also evaluate other parts of the report. The two most interesting bits of information are the exit state of each child (featuring the exit code) and the health. The health is the child's ability to respond to external events. It is described in more detail at [2]. It effectively allows the manager to implement watchdog functionality.
[2] https://genode.org/documentation/genode-foundations/19.05/system_configurati...
Let me stress that in contrast to the child (which implements complex functionality), such a manager component is much less prone to bugs. All it does is consuming reports (parsing some XML) and generating configurations (generating XML). It does not not need any C runtime, file system, or I/O drivers. It can be implemented w/o any dynamic memory allocations. In contrast to the "child", which is expected to be flaky, the "manager" and the dynamic init are supposed to be correct and trustworthy. The functionality of the "dynamic init" is present in the regular "init" component. So the dynamic management does not add new code complexity.
The mechanisms described above are all available right now. Unfortunately, there are no illustrative examples. Thanks to Alexander's and your emails, I realized that this important component-composition pattern is actually missing from the Genode Foundations book. So I will have to do some homework by adding a section. As of now, you may have a look at the init test, which exercises the mechanisms. E.g., the dynamic response to resource requests is tested at [3]. As another - admittedly quite sophisticated - example, the Sculpt manager component [4] plays the role of the manager for the Sculpt system. E.g., it automatically increases the RAM quota of the depot_rom component as needed.
[3] https://github.com/genodelabs/genode/blob/master/repos/os/recipes/raw/test-i... [4] https://github.com/genodelabs/genode/tree/master/repos/gems/src/app/sculpt_m...
- init/core (is configured to) kill the child as soon as it oversteps its RAM quota (or capability quota); then restart it,
either in lock-step or by another component (sequence?), in order to keep the "init" component as simple as possible.
- the program itself detects that it has gone over-quota, and commits "suicide", then gets restarted (by e.g. "sequence"
mentionned above)
- the program implements a hook (callback) for handling quota going over-bound, which simply calls exit(), and
"sequence" handles the restarting.
- "sequence" itself is the direct parent (instead of init) of the radio app, and thus is the recipient of ram quota requests,
and handles them differently from init: it kills-restarts the app when receiving such a request.
- "sequence" is the direct parent, and monitors the ram usage of its child (say, once per second), because the above
method does not work somehow
- other ideas ?
I think all options should be covered by the dynamic init construction described above. The approach can naturally be extended by letting the child "report" additional (higher-level) information about its internal state. The manager may take those reports into account also. E.g., the Sculpt manager consumes the reports generated by drivers and part_block instances to find the Genode partition.
Cedric, could I answer your question?
Cheers Norman