Hi Colin,
On 30.01.21 06:38, Colin Parker wrote:
Hello Genodians, I'm hoping someone can help or point me to a good reference for the error that pops up occasionally saying "attempt to handle the same signal context twice (nested)". This happens within some Sculpt components (decorator, depot_query), but also happens in this USB wifi driver I'm trying to make. Mostly I want to understand it - my long explanation is below.
My basic understanding is Genode gets upset when waiting for a signal within another signal handler. However, the "natural" way I would like this driver to behave is to be signal driven - all code is responding to some signal or other. However, sometimes one needs to do synchronous IO operations. If it is truly forbidden ever to wait within a signal handler, I considered a few options, but found all to be non-ideal.
- Use stub signal handlers to insert elements into a work queue and
drive everything from an event loop. This seems like reinventing the signal mechanism, but might be what's required. 2) Build state machines to break up synchronous IO operations with no waiting at all. This makes simple-looking operations complicated and increases debugging complexity. 3) Busy-wait for synchronous operations. Probably a big performance penalty, and obviously wasteful.
I agree with everything you wrote.
So I proceeded, because I do notice that sometimes it is OK to wait within a handler (i.e. the warning does not occur). But I still get the warning other times, and I can't quite figure out when or why it happens. Initially, I thought that if I separated things out, so that, say, Signal_context A waits, but can only receive signals for Signal_context B or C, it would be OK. Now, I am not totally sure I've actually achieved this, but I think that I have, and I've become suspicious that I don't really understand the "rules." So, is there a way to understand when one can wait safely wait within a signal handler? Is it really as simple as "Signal A cannot be generated while waiting if a handler for A is on the stack?" Does the App vs IO signal distinction come into play (I have both).
Indeed. The distinction between I/O signal handlers and regular (application-level) signal handlers was introduced to address exactly this scenario.
Even though one should generally aspire to avoid the nesting of signal handlers, it can sometimes not be avoided for the reasons you stated. However, we observed that those situations show typical patterns.
- At the application level, the need for nesting signal handlers strongly hints at a design issue or bug. This is deliberately not supported by Genode. [Technically, it is still possible to implement such bad designs by using multiple entrypoints]
- Application-level signal handlers may perform I/O, which is perfectly reasonable. E.g., a 'handle_config' signal handler may perform file I/O. Internally, these I/O operations may use asynchronous ways of communication, involving the wait for a notification. Often, the application-level code cannot even tell whether a call a library implicitly depends on asynchronous I/O or not.
- I/O signal handlers have a very narrow scope. In particular, they do not alter application-level state.
- While waiting for I/O to progress, the application is blocked. From the application's point of view, it looks like an atomic operation. The control flow of an I/O signal handler never enters any application code or touches application-level state.
Given these patterns, it is reasonable to give application-level code the ability to "poll" for I/O signals. The intention is always: need some I/O to make progress. The "polling" is not really busy polling but looks like this:
while (condition_for_progress_unsatisfied()) { ep.block_and_dispatch_one_io_signal(); }
The 'block_and_dispatch_one_io_signal' can implicitly execute any I/O signal handler that happens to receive a signal, not just the one we wait for. However, once the interesting one triggers, the handler would change the 'condition_for_progress_unsatisfied'. So after the right signal came in, the while loop finishes. Keep in mind that an I/O signal handler is supposed to never call application-1evel code.
While blocking in 'block_and_dispatch_one_io_signal()', no application-level signal handler can execute.
You can find several examples by grepping the source tree:
$ grep -r wait_and_dispatch_one_io_signal repos
For a good example, have a look at 'repos/os/include/os/vfs.h'. Many of the VFS utilities provide a convenient synchronous API by wrapping an asynchronous interface (the VFS).
The single nesting of processing I/O signals from the handler of an application-level signal is quite typical.
In rare circumstances, I/O signal handlers may depend of other I/O signal handlers. This is not beautiful but given the narrow reach of an I/O signal handler, not strictly a bug. However, Genode still warns in these latter situations. These are the messages you have noticed.
I hope that you will find the mental model of "application-level" code versus "I/O-1evel" code helpful.
Also, is there a way to get a Genode app to output a stack trace - I can patch that warning mesage to output a stack trace and at least see what actually happened?
Please have a look at the 'os/backtrace.h' helper.
https://github.com/genodelabs/genode/blob/master/repos/os/include/spec/x86_6...
To get useful output, please put the following line into the file etc/tools.conf within your build directory.
CC_OPT += -fno-omit-frame-pointer
Cheers Norman