Restarting siblings vs. restarting nieces

Martin Stein martin.stein at genode-labs.com
Sun Feb 5 13:42:06 CET 2023


Hi Sid,

Let me share the outcome of our offline discussion with the mailing list:

As far as I understand it, while Genode's init has the feature of
restarting direct service clients when their session disappears, this
doesn't apply in your scenario because of the server being wrapped in an
additional sub-init.

In such case, you have to manually take care for restarting your
clients. A client, AFAIK deliberately doesn't consider the case that the
outside world is terminating its session. So, it seems natural to me
that you run into unpredictable behavior if you don't have some kind of
manager that kills the client before terminating its session.

I hope this is of help?

Cheers,
Martin

On 03.02.23 14:19, Sid Hussmann wrote:
> Hi Martin,
> 
> Thank you very much for the list of commits! As we are still dealing with a driver issue with the Genode 22.11 release [1], I cherry-picked these commits to our fork based on the 22.08 release.
> 
> I'm not sure how much value it has to you when my tests are based on 22.08 (with the commits you mentioned), but in case you are curious here are my findings.
> 
> After running my scenario multiple times, the system does not behave the same way in each iteration. There are two different behaviors that I noticed:
> 
> 1. The two client crash - which for the overall system is good as a heartbeat monitor can recover the system into the desired state again:
> ```
> no RM attachment (READ pf_addr=0x100004 pf_ip=0x10e3e194 from pager_object: pd='init -> init -> fs_client1' thread='ep')    
> Warning: page fault, pager_object: pd='init -> init -> fs_client1' thread='ep' ip=0x10e3e194 fault-addr=0x100004 type=no-page  
> [init -> init] Error: A fault in the pd of child 'fs_client1' was detected  
> Kernel: IPC await request: bad state, will block  
> Warning: page fault, pager_object: pd='init -> init -> fs_client2' thread='pthread.0' ip=0x6f4b0 fault-addr=0x403befd0 type=no-page  
> [init -> init -> fs -> init] Error: Uncaught exception of type 'Genode::Id_space<Genode::Parent::Client>::Unknown_id'  
> [init -> init -> fs -> init] Warning: abort called - thread: ep
> ```
> 
> 2. `rump` (short for a `vfs_server` with `rump` plugin) restarts while the rest of the system does not print any log messages. In this case we cannot recover via the heartbeat monitor as there is no change in the `init` state report. Further, the clients don't seem to be functioning correctly. E.g. one of them being a TCP server that won't respond to networking traffic anymore. Could it be that somehow the `vfs` plugin can't handle the interruption of the `File_system` session? 
> ```
> [init -> init -> fs -> init -> rump] rump: /genode: file system not clean; please fsck(8)
> ```
> 
> I'm not sure if this information is of value to you. Especially when my scenario is based on Genode 22.08. I will test this again once we have the 22.11 or the 23.02 release in.
> 
> 
> [1] https://lists.genode.org/pipermail/users/2023-January/008356.html
> 
> Cheers,
> Sid
> 
> 
> _______________________________________________
> Genode users mailing list
> users at lists.genode.org
> https://lists.genode.org/listinfo/users
> 



More information about the users mailing list