Hi Sid,
Let me share the outcome of our offline discussion with the mailing list:
As far as I understand it, while Genode's init has the feature of restarting direct service clients when their session disappears, this doesn't apply in your scenario because of the server being wrapped in an additional sub-init.
In such case, you have to manually take care for restarting your clients. A client, AFAIK deliberately doesn't consider the case that the outside world is terminating its session. So, it seems natural to me that you run into unpredictable behavior if you don't have some kind of manager that kills the client before terminating its session.
I hope this is of help?
Cheers, Martin
On 03.02.23 14:19, Sid Hussmann wrote:
Hi Martin,
Thank you very much for the list of commits! As we are still dealing with a driver issue with the Genode 22.11 release [1], I cherry-picked these commits to our fork based on the 22.08 release.
I'm not sure how much value it has to you when my tests are based on 22.08 (with the commits you mentioned), but in case you are curious here are my findings.
After running my scenario multiple times, the system does not behave the same way in each iteration. There are two different behaviors that I noticed:
- The two client crash - which for the overall system is good as a heartbeat monitor can recover the system into the desired state again:
no RM attachment (READ pf_addr=0x100004 pf_ip=0x10e3e194 from pager_object: pd='init -> init -> fs_client1' thread='ep') Warning: page fault, pager_object: pd='init -> init -> fs_client1' thread='ep' ip=0x10e3e194 fault-addr=0x100004 type=no-page [init -> init] Error: A fault in the pd of child 'fs_client1' was detected Kernel: IPC await request: bad state, will block Warning: page fault, pager_object: pd='init -> init -> fs_client2' thread='pthread.0' ip=0x6f4b0 fault-addr=0x403befd0 type=no-page [init -> init -> fs -> init] Error: Uncaught exception of type 'Genode::Id_space<Genode::Parent::Client>::Unknown_id' [init -> init -> fs -> init] Warning: abort called - thread: ep
- `rump` (short for a `vfs_server` with `rump` plugin) restarts while the rest of the system does not print any log messages. In this case we cannot recover via the heartbeat monitor as there is no change in the `init` state report. Further, the clients don't seem to be functioning correctly. E.g. one of them being a TCP server that won't respond to networking traffic anymore. Could it be that somehow the `vfs` plugin can't handle the interruption of the `File_system` session?
[init -> init -> fs -> init -> rump] rump: /genode: file system not clean; please fsck(8)
I'm not sure if this information is of value to you. Especially when my scenario is based on Genode 22.08. I will test this again once we have the 22.11 or the 23.02 release in.
[1] https://lists.genode.org/pipermail/users/2023-January/008356.html
Cheers, Sid
Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users