Hello Genodians, I'm hoping someone can help or point me to a good reference for the error that pops up occasionally saying "attempt to handle the same signal context twice (nested)". This happens within some Sculpt components (decorator, depot_query), but also happens in this USB wifi driver I'm trying to make. Mostly I want to understand it - my long explanation is below.
My basic understanding is Genode gets upset when waiting for a signal within another signal handler. However, the "natural" way I would like this driver to behave is to be signal driven - all code is responding to some signal or other. However, sometimes one needs to do synchronous IO operations. If it is truly forbidden ever to wait within a signal handler, I considered a few options, but found all to be non-ideal. 1) Use stub signal handlers to insert elements into a work queue and drive everything from an event loop. This seems like reinventing the signal mechanism, but might be what's required. 2) Build state machines to break up synchronous IO operations with no waiting at all. This makes simple-looking operations complicated and increases debugging complexity. 3) Busy-wait for synchronous operations. Probably a big performance penalty, and obviously wasteful.
So I proceeded, because I do notice that sometimes it is OK to wait within a handler (i.e. the warning does not occur). But I still get the warning other times, and I can't quite figure out when or why it happens. Initially, I thought that if I separated things out, so that, say, Signal_context A waits, but can only receive signals for Signal_context B or C, it would be OK. Now, I am not totally sure I've actually achieved this, but I think that I have, and I've become suspicious that I don't really understand the "rules." So, is there a way to understand when one can wait safely wait within a signal handler? Is it really as simple as "Signal A cannot be generated while waiting if a handler for A is on the stack?" Does the App vs IO signal distinction come into play (I have both). Also, is there a way to get a Genode app to output a stack trace - I can patch that warning mesage to output a stack trace and at least see what actually happened?
Thanks as always to the Genode community for sharing your knowledge with me.
Best regards, Colin
Hi Colin,
On 30.01.21 06:38, Colin Parker wrote:
Hello Genodians, I'm hoping someone can help or point me to a good reference for the error that pops up occasionally saying "attempt to handle the same signal context twice (nested)". This happens within some Sculpt components (decorator, depot_query), but also happens in this USB wifi driver I'm trying to make. Mostly I want to understand it - my long explanation is below.
My basic understanding is Genode gets upset when waiting for a signal within another signal handler. However, the "natural" way I would like this driver to behave is to be signal driven - all code is responding to some signal or other. However, sometimes one needs to do synchronous IO operations. If it is truly forbidden ever to wait within a signal handler, I considered a few options, but found all to be non-ideal.
- Use stub signal handlers to insert elements into a work queue and
drive everything from an event loop. This seems like reinventing the signal mechanism, but might be what's required. 2) Build state machines to break up synchronous IO operations with no waiting at all. This makes simple-looking operations complicated and increases debugging complexity. 3) Busy-wait for synchronous operations. Probably a big performance penalty, and obviously wasteful.
I agree with everything you wrote.
So I proceeded, because I do notice that sometimes it is OK to wait within a handler (i.e. the warning does not occur). But I still get the warning other times, and I can't quite figure out when or why it happens. Initially, I thought that if I separated things out, so that, say, Signal_context A waits, but can only receive signals for Signal_context B or C, it would be OK. Now, I am not totally sure I've actually achieved this, but I think that I have, and I've become suspicious that I don't really understand the "rules." So, is there a way to understand when one can wait safely wait within a signal handler? Is it really as simple as "Signal A cannot be generated while waiting if a handler for A is on the stack?" Does the App vs IO signal distinction come into play (I have both).
Indeed. The distinction between I/O signal handlers and regular (application-level) signal handlers was introduced to address exactly this scenario.
Even though one should generally aspire to avoid the nesting of signal handlers, it can sometimes not be avoided for the reasons you stated. However, we observed that those situations show typical patterns.
- At the application level, the need for nesting signal handlers strongly hints at a design issue or bug. This is deliberately not supported by Genode. [Technically, it is still possible to implement such bad designs by using multiple entrypoints]
- Application-level signal handlers may perform I/O, which is perfectly reasonable. E.g., a 'handle_config' signal handler may perform file I/O. Internally, these I/O operations may use asynchronous ways of communication, involving the wait for a notification. Often, the application-level code cannot even tell whether a call a library implicitly depends on asynchronous I/O or not.
- I/O signal handlers have a very narrow scope. In particular, they do not alter application-level state.
- While waiting for I/O to progress, the application is blocked. From the application's point of view, it looks like an atomic operation. The control flow of an I/O signal handler never enters any application code or touches application-level state.
Given these patterns, it is reasonable to give application-level code the ability to "poll" for I/O signals. The intention is always: need some I/O to make progress. The "polling" is not really busy polling but looks like this:
while (condition_for_progress_unsatisfied()) { ep.block_and_dispatch_one_io_signal(); }
The 'block_and_dispatch_one_io_signal' can implicitly execute any I/O signal handler that happens to receive a signal, not just the one we wait for. However, once the interesting one triggers, the handler would change the 'condition_for_progress_unsatisfied'. So after the right signal came in, the while loop finishes. Keep in mind that an I/O signal handler is supposed to never call application-1evel code.
While blocking in 'block_and_dispatch_one_io_signal()', no application-level signal handler can execute.
You can find several examples by grepping the source tree:
$ grep -r wait_and_dispatch_one_io_signal repos
For a good example, have a look at 'repos/os/include/os/vfs.h'. Many of the VFS utilities provide a convenient synchronous API by wrapping an asynchronous interface (the VFS).
The single nesting of processing I/O signals from the handler of an application-level signal is quite typical.
In rare circumstances, I/O signal handlers may depend of other I/O signal handlers. This is not beautiful but given the narrow reach of an I/O signal handler, not strictly a bug. However, Genode still warns in these latter situations. These are the messages you have noticed.
I hope that you will find the mental model of "application-level" code versus "I/O-1evel" code helpful.
Also, is there a way to get a Genode app to output a stack trace - I can patch that warning mesage to output a stack trace and at least see what actually happened?
Please have a look at the 'os/backtrace.h' helper.
https://github.com/genodelabs/genode/blob/master/repos/os/include/spec/x86_6...
To get useful output, please put the following line into the file etc/tools.conf within your build directory.
CC_OPT += -fno-omit-frame-pointer
Cheers Norman
Also, is there a way to get a Genode app to output a stack trace - I can patch that warning mesage to output a stack trace and at least see what actually happened?
Please have a look at the 'os/backtrace.h' helper.
https://github.com/genodelabs/genode/blob/master/repos/os/include/spec/x86_6... /os/backtrace.h
To get useful output, please put the following line into the file etc/tools.conf within your build directory.
CC_OPT += -fno-omit-frame-pointer
... and in case it's useful I'll add this:
https://chiselapp.com/user/ttcoder/repository/genode-book/wiki?name=Book:Tip...
This is a (hastily put together) list of hints I collected from the mailing-list and from Norman over time, showing what to do with the backtrace addresses and such.
Cedric
Hi Cedric,
On 06.02.21 09:48, ttcoder@netcourrier.com wrote:
... and in case it's useful I'll add this:
https://chiselapp.com/user/ttcoder/repository/genode-book/wiki?name=Book:Tip...
that's a really nice collection.
I have now taken my current line of work on the Pine64 as opportunity to present the most important practical hints in a new article:
https://genodians.org/nfeske/2021-02-11-pine-fun-debugging
Cheers Norman
Norman, Thanks for this, and also thanks to Cedric for the "unofficial" guide. Related to my original inquiry, I read that the log function "relies on a fair bit of framework infrastructure such as synchronization primitives," and I became curious if calling "log" as a debugging message from within an IO signal handler is possibly the source of my problem? This is supported by my having given up on the warning message and disabled the debug messages, only to see the warning message disappear. I am still doubtful that it is the true cause - I think that it's more likely that the shorter signal handler with no logging simply doesn't have time to trigger the bad behavior very often - but I became concerned that if indeed "log" is a problem if called during a signal handler, then my entire debugging strategy needs to be revisited. So it is worth asking someone who understands the signaling and IPC better than I do if log is OK during an IO signal handler (i.e. that log will not allow any other signal handlers to run)?
Regards,
On Fri, Feb 12, 2021 at 8:04 AM Norman Feske norman.feske@genode-labs.com wrote:
Hi Cedric,
On 06.02.21 09:48, ttcoder@netcourrier.com wrote:
... and in case it's useful I'll add this:
https://chiselapp.com/user/ttcoder/repository/genode-book/wiki?name=Book:Tip...
that's a really nice collection.
I have now taken my current line of work on the Pine64 as opportunity to present the most important practical hints in a new article:
https://genodians.org/nfeske/2021-02-11-pine-fun-debugging
Cheers Norman
-- Dr.-Ing. Norman Feske Genode Labs
https://www.genode-labs.com · https://genode.org
Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users
Hi Colin,
On 17.02.21 15:19, Colin Parker wrote:
Related to my original inquiry, I read that the log function "relies on a fair bit of framework infrastructure such as synchronization primitives," and I became curious if calling "log" as a debugging message from within an IO signal handler is possibly the source of my problem?
No. It's safe to use the 'log' function from an I/O signal handler.
I am still doubtful that it is the true cause - I think that it's more likely that the shorter signal handler with no logging simply doesn't have time to trigger the bad behavior very often
- but I became concerned that if indeed "log" is a problem if called
during a signal handler, then my entire debugging strategy needs to be revisited.
I agree. The use of 'log' skews the performance quite a bit. You may consider the 'trace' mechanism instead, which greatly reduces the side effects by logging the output to a thread-local trace buffer. In particular, you may inspect the use of the trace_logger component. You can find an example scenario at os/recipes/pkg/test-trace_logger/.
Cheers Norman