Dear Genodians,
I have noticed that at least when running on Linux, the fetchurl component started by depot_download_manager will always restart immediately after it is first executed (before anything has been downloaded). After that the download proceeds as normal.
This is because Fetchurl_watchdog uses trigger_periodic to trigger the timer handler which checks the current download progress. As a consequence, this handler /might/ be called immediately, before fetchurl has had the chance to do any work.
I believe this is a bug, but one that might not manifest on all platforms due to the uncertain timing introduced by trigger_periodic.
I have attached a potential patch, if this sounds okay to you I will open an Issue on GitHub.
Best regards, Timo
Hello Nicolai,
On Wed, Apr 12, 2023 at 09:05:21 CEST, Timo Nicolai wrote:
I have noticed that at least when running on Linux, the fetchurl component started by depot_download_manager will always restart immediately after it is first executed (before anything has been downloaded). After that the download proceeds as normal.
This is because Fetchurl_watchdog uses trigger_periodic to trigger the timer handler which checks the current download progress. As a consequence, this handler /might/ be called immediately, before fetchurl has had the chance to do any work.
Your investigation nails it: the first periodic timeout triggers (almost) immediately. I stumbled upon this undocumented behavior of periodic timeouts in a different context myself recently [1].
I believe this is a bug, but one that might not manifest on all platforms due to the uncertain timing introduced by trigger_periodic.
I have attached a potential patch, if this sounds okay to you I will open an Issue on GitHub.
AFAICS your patch fixes the issue for fetchurl. Nevertheless, I favor the use of periodic timeouts in proper contexts over oneshot timeouts. How about keeping the periodic timeout and handling the _initial_ state in Fetchurl_watchdog::_handle()?
[1] https://github.com/genodelabs/genode/commit/7feea78991285c6c3b3c0f83659b80c6...
Regards
Thank you Timo for pinpointing this issue and for providing a fix!
For tracking it, I've opened an issue [1].
[1] https://github.com/genodelabs/genode/issues/4815
On 2023-04-17 08:22, Christian Helmuth wrote:
AFAICS your patch fixes the issue for fetchurl. Nevertheless, I favor the use of periodic timeouts in proper contexts over oneshot timeouts. How about keeping the periodic timeout and handling the _initial_ state in Fetchurl_watchdog::_handle()?
In [2], I have now created a patch in this spirit. With this approach, the watchdog handler obtains the current time from the timer to get the ground truth instead of relying on the mere sequence of handler calls.
[2] https://github.com/genodelabs/genode/commit/b5371fe3215ea2ec9514530e97a0145b...
While being at it, I noticed that the depot_download.run script still triggered the watchdog mechanism when reducing the PERIOD_SECONDS value to 2 seconds, which made me curious. This effect was caused by the intertwining of watchdog signals and init-state signals. In cases where fetchurl exited with value 22 (due to HTTP error 404), the watchdog handler was sometimes executed right before the init-state handler had a chance to evaluate the exit code, causing unnecessary respawns. By reporting child exits more eagerly [3], this effect is mitigated.
[3] https://github.com/genodelabs/genode/issues/4814
Thanks again for bringing up this issue. It is nice to see the warning noise disappear. ;-)
Cheers Norman