Hello Stefan,
First the depot autopilot uses significantly more distinct syscalls and I dont quite understand why that is. The syscalls used above those used by the basic test-log are: clone, getpid, sigaltstack, rt_sigaction, gettimeofday, nanosleep
Can anyone explain why these are nessecary?
you can spot these places by grep'ping for the syscalls within the base-linux repository. E.g.,
grep -ri gettimeofday repos/base-linux
In short, 'clone', 'getpid', 'signalstack' (I just noticed that the code misspelles this word as 'sigalstack'), 'sigaction' are required by multi-threaded applications. These system calls are required to spawn a new thread in the local address space of a component.
The 'gettimeofday' system call is solely needed the by timer driver, for obvious reasons. Granted, it would be nice to restrict the use of this system call to only this component, but from the tip of my head, I have no good idea how to accomplish such a distinction.
The 'nanosleep' system call appears in the timer driver and at a few places of the base library (winding down a thread, thread yield, and infinite blocking). Those latter cases may potentially be addressed by other means. I'm certainly open for suggestions. ;-)
The other problem I couldn't solve up to now is that the depot autopilot seems to use many more sessions than the scenario itself. Even for the basic test-log scenario at least 512 sessions are used by a single process as it fails due to running out of socket descriptors when a socketpair per session is used.
The scenario is much more complex compared to an individual test. So a higher number of sessions is plausible. That said, I agree that a peak of 512 sessions is intuitively much higher than one would expect.
Can anyone explain this behavior? Might there be stale sessions (leak) in the depot autopilot?
I have no explanation, unfortunately,
A leak of stale sessions seems unlikely to me. The nightly runs of depot_autopilot execute more than 80 tests (most of them consist of several components running) in sequence without hitting the regular FD limit on Linux. With the amount of component creation and destruction happening, a leak would presumably have bitten us already.
Cheers Norman