Hi Stefan,
unfortunately, I cannot reproduce the problem as it was some time ago and I don't remember the exact situation.
On Wed, 31 Aug 2016 13:15:44 +0200 Stefan Kalkowski <stefan.kalkowski@...1...> wrote:
Hi Johannes,
On 08/30/2016 09:40 PM, Johannes Schlatow wrote:
Norman, thanks a lot for the clarification!
I must admit that I only had a rather brief look at the existing code.
You explanation still leaves me puzzled with one little question though: Where does the "invalid signal-context capability" come from? I actually noticed that a couple of times in the past and was wondering what's causing this.
"invalid signal-context capability" is printed when someone used an invalid capability (e.g., not set signal handler) to submit a signal. It can be printed for many reasons. Typically you can see it when a fault in the thread context area cannot be resolved.
I was wondering: which kernel are you using right now? Because I also stumbled across the problem that on top of certain kernels (e.g., Fiasco old, Pistachio, SeL4...) we do not print an error message when a page-fault cannot be resolved within a managed region-map area (e.g., within the thread context area). I will open an issue for this on github.
Regards Stefan
On Tue, 30 Aug 2016 20:53:34 +0200 Norman Feske <norman.feske@...1...> wrote:
Hi Johannes,
I'm afraid that you misinterpreted the role of the "stack allocator". Stacks are actually not allocated consecutively but within a sparsely populated area (called stack area) within the component's virtual memory space.
We introduced the current stack allocation scheme back in Genode 10.02:
http://genode.org/documentation/release-notes/10.02#New_thread-context_manag...
In short, the stack allocator is used to allocate slots within the stack area, that hosts all the stacks. Each slot is 1 MiB of virtual memory, aligned to a 1 MiB boundary. The actual stack (typically just a few KiB) is placed within the slot but most of the slot remains unpopulated. Consequently, guard pages are already in place - plenty of them.
The only thing that changed since 10.02 is the naming. I removed the notion of the "thread context area" earlier this year and just speak of "stack area" instead. This was done to simplify the terminology used within the framework's implementation.
Stack overflows are not only very annoying and time consuming but can (imo) also be mitigated rather easily. I therefore think it would be worth implementing a protection or detection mechanism for this in Genode.
Usually, when a stack overflows, you get a message indicating that an unresolvable page fault has occurred with the virtual-memory range of the stack area. On base-linux, the address can be found in the dmesg output. On the other kernels, core's pager prints a message (mostly accompanied with something like "invalid signal-context capability").
I doubt that stack corruptions were the reason for the trouble you observed. I can vividly remember nerve-wracking bug hunting sessions prior version 10.02 that were caused by stack overflows corrupting adjacent memory, but this hasn't been an issue since then.
Alternatively, I can imagine a kernel-level (base-hw) approach which uses canaries at the top of each stack. Every time the kernel switches to a user thread, it checks whether the canary is still alive. If not, another thread's stack must have overflowed. Of course, this method is only reliable if we can assume that every memory word on the stack will be initialised (preferably sequentially).
Stack canaries are actually a good idea, which we will investigate in the near future - but not to counter stack overflows but as a protection measure against deliberate stack-smashing attacks.
Cheers Norman
genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main