Hello Genodians,
I observed that when a runtime configuration with a app already present (downloaded and unpacked) accessing a service on new server (version) is deployed, the app gets a Service_denied exception while the new server (version) is being downloaded, verified or unpacked.
Wouldn't it be better to start all components in the runtime configuration only when all possible downloading is complete?
Otherwise we either need to make components tolerant to Service_denied exceptions or detect such error conditions and restart the runtime entirely.
Cheers Stefan
Hi Stefan,
I observed that when a runtime configuration with a app already present (downloaded and unpacked) accessing a service on new server (version) is deployed, the app gets a Service_denied exception while the new server (version) is being downloaded, verified or unpacked.
Wouldn't it be better to start all components in the runtime configuration only when all possible downloading is complete?
the sculpt manager is already supposed to work that way. A subsystem is started only if all children referred to the subsystem's routing rules are present in the runtime (by looking at the children's names in the report/runtime/state report). Otherwise, the sculpt manager displays messages like "<subsystem> requires <server-name>" in the "Runtime" dialog and defers the start of the subsystem.
I guess that you hit a corner case not properly handled so far. Can you confirm that my understanding of the situation is correct? You already had a server running. Now you changed the pkg version but keep the server's name the same. This triggers the download of the new pkg. While downloading, you start a client. Unexpectedly, the client starts before the server's new pkg is ready. It could very well be that such an on-the-fly version update is the problem. To investigate, I would very much appreciate a simple sequence of steps (preferably using the RAM fs) to reproduce the behavior.
Otherwise we either need to make components tolerant to Service_denied exceptions or detect such error conditions and restart the runtime entirely.
We should better address the corner case. ;-)
Cheers Norman
Hi Norman
I guess that you hit a corner case not properly handled so far. Can you confirm that my understanding of the situation is correct? You already had a server running. Now you changed the pkg version but keep the server's name the same. This triggers the download of the new pkg. While downloading, you start a client. Unexpectedly, the client starts before the server's new pkg is ready. It could very well be that such an on-the-fly version update is the problem. To investigate, I would very much appreciate a simple sequence of steps (preferably using the RAM fs) to reproduce the behavior.
Sequence of steps to reproduce: a) build a pkg/report_rom [1] b) build a pkg/ram_fs_report with ram_fs and fs_report [2] c) create a deploy config A with pkg/report_rom and any app [3] the keeps a report connection open [4] d) modify deploy config A to deploy config B using pkg/ram_fs_report instead of pkg/report_rom [5] e) make sure pkg/report_rom and the used app but _not_ pkg/ram_fs_report are downloaded and extracted f) start deploy config A by copying to /config/deploy g) start deploy config B by copying to /config/deploy h) observe the Service_denied exception
I hope, this makes my scenario sufficiently reproducible.
Regards Stefan
[1] pkg/report_rom runtime: <runtime ram="4M" caps="100" binary="report_rom"> <provides><report/><rom/></provides> <config/> <content> <rom label="ld.lib.so"/> <rom label="report_rom"/> </content> </runtime>
[2] pkg/ram_fs_report runtime: <runtime ram="32M" caps="1000" binary="init"> <provides><report/><rom/></provides> <content> <rom label="ld.lib.so"/> <rom label="ram_fs"/> <rom label="fs_report"/> <rom label="fs_rom"/> <rom label="vfs.lib.so"/> </content> <config> <parent-provides> <service name="CPU"/> <service name="LOG"/> <service name="PD"/> <service name="ROM"/> </parent-provides> <default-route><any-service> <parent/><any-child/> </any-service></default-route> <default caps="100"/> <service name="ROM"><default-policy> <child name="fs_rom"/> </default-policy></service> <service name="Report"><default-policy> <child name="fs_report"/> </default-policy></service> <start name="ram_fs"> <resource name="RAM" quantum="4M"/> <provides> <service name="File_system"/> </provides> <config> <content> </content> <policy label_prefix="fs_report -> " root="/" writeable="yes"/> <policy label_prefix="fs_rom -> " root="/" writeable="no"/> </config> </start> <start name="fs_report"> <resource name="RAM" quantum="4M"/> <provides> <service name="Report"/> </provides> <config> <vfs> <fs/> </vfs> </config> </start> <start name="fs_rom"> <resource name="RAM" quantum="4M"/> <provides> <service name="ROM"/> </provides> <config/> </start> </config> </runtime>
[3] pgk/report_connection runtime: <runtime ram="4M" caps="100" binary="report_connection"> <requires><report/><timer/></requires> <config/> <content> <rom label="ld.lib.so"/> <rom label="report_connection"/> </content> </runtime>
[4] deploy config A <config arch="x86_64"> <common_routes> <service name="ROM" label_last="ld.lib.so"> <parent/> </service> <service name="ROM" label_last="init"> <parent/> </service> <service name="CPU"> <parent/> </service> <service name="PD"> <parent/> </service> <service name="LOG"> <parent/> </service> <service name="Timer"> <parent/> </service> </common_routes> <start name="temp_report" pkg="throwException/pkg/report_rom/2018-07-06"> <config verbose="yes"> <policy label="brightness" report="brightness"/> </config> </start> <start name="test" pkg="throwException/pkg/report_connection/2018-07-06-l"> <route> <service name="Report"> <child name="temp_report"/> </service> </route> <config> <vfs> <fs/> </vfs> </config> </start> </config>
[5] deploy config B <config arch="x86_64"> <common_routes> <service name="ROM" label_last="ld.lib.so"> <parent/> </service> <service name="ROM" label_last="init"> <parent/> </service> <service name="CPU"> <parent/> </service> <service name="PD"> <parent/> </service> <service name="LOG"> <parent/> </service> <service name="Timer"> <parent/> </service> </common_routes> <start name="new_report" pkg="throwException/pkg/fs_report_server/2018-07-06-a"> </start> <start name="test" pkg="throwException/pkg/report_connection/2018-07-06-l"> <route> <service name="Report"> <child name="new_report"/> </service> </route> <config> <vfs> <fs/> </vfs> </config> </start> </config>
Hi Stefan,
thank you for the detailed steps, which make the situation much more clear. Actually, I can spot the problem without trying out the steps:
In your scenario, you change the routing rules for a component that is already running. The sculpt manager does not look at the routing rules of running components but passes the <route> content to the runtime (init) configuration as is. Init, in turn, responds to the routing change by killing and restarting the affected component, which is expected. From init's perspective, the new route is not valid because the server does not exist at this point (it is not yet part of the runtime config until it is completely installed). So the session request by the new instance of the client is denied.
The same situation can be produced with the default config/deploy.
1. Uncomment fonts_fs, wm, backdrop, and nano3d. Wait until all components are downloaded and nano3d is running in a window. 2. Switch the networking to "local" to cut the connection to the internet. 3. Copy the <start> node of the wm, modify the start name to "wm.2" and rename the pkg to some unexisting name like "themed_wm.2". The log will show a message "...themed_wm.2 incomplete or missiing" 4. Change the <route> of nano3d to <child name="wm.2"/>.
(actually, one can leave out steps 2 and 3)
It turns out that all components work as designed but I did not anticipate the dynamic change of <route> content in config/deploy. The sculpt manager looks only at this content as a precondition before starting a component but it does not monitor the information once it is satisfied. The relevant parts of the code are [1, 2].
[1] https://github.com/genodelabs/genode/blob/master/repos/gems/src/app/sculpt_m... [2] https://github.com/genodelabs/genode/blob/master/repos/gems/src/app/depot_de...
In [2] you can see that a condition stays satisfied once it becomes satisfied for the first time. In principle, the sculpt manager could re-evaluate all <route> nodes (not just the incomplete ones) each time the runtime config is generated. Removing the early return should do the trick. I tested this with the simplified scenario outlined above.
However, this is just a quickfix because this change increases the XML parsing overhead inside the sculpt manager to a level I feel uncomfortable with. For each route - not just for the incomplete components but also for all running components - the condition check performs one parsing pass of the runtime's state report. Therefore, the costs for the checks increase quadratically with the number of start nodes. So to properly solve it, we need to cache the runtime state inside the sculpt manager.
I'm going to implement that for the release 18.08. Until then, I hope that the quickfix works for you as an interim solution.
Cheers Norman