Stack overflow protection/detection

Tue Aug 30 21:40:40 CEST 2016

Norman, thanks a lot for the clarification!

I must admit that I only had a rather brief look at the existing code.

You explanation still leaves me puzzled with one little question though:
Where does the "invalid signal-context capability" come from? I
actually noticed that a couple of times in the past and was wondering
what's causing this.

On Tue, 30 Aug 2016 20:53:34 +0200
Norman Feske <norman.feske at ...1...> wrote:

> Hi Johannes,
> 
> I'm afraid that you misinterpreted the role of the "stack allocator".
> Stacks are actually not allocated consecutively but within a sparsely
> populated area (called stack area) within the component's virtual
> memory space.
> 
> We introduced the current stack allocation scheme back in Genode
> 10.02:
> 
> 
> http://genode.org/documentation/release-notes/10.02#New_thread-context_management
> 
> In short, the stack allocator is used to allocate slots within the
> stack area, that hosts all the stacks. Each slot is 1 MiB of virtual
> memory, aligned to a 1 MiB boundary. The actual stack (typically just
> a few KiB) is placed within the slot but most of the slot remains
> unpopulated. Consequently, guard pages are already in place - plenty
> of them.
> 
> The only thing that changed since 10.02 is the naming. I removed the
> notion of the "thread context area" earlier this year and just speak
> of "stack area" instead. This was done to simplify the terminology
> used within the framework's implementation.
> 
> > Stack overflows are not only very annoying and time consuming but
> > can (imo) also be mitigated rather easily. I therefore think it
> > would be worth implementing a protection or detection mechanism for
> > this in Genode.  
> 
> Usually, when a stack overflows, you get a message indicating that an
> unresolvable page fault has occurred with the virtual-memory range of
> the stack area. On base-linux, the address can be found in the dmesg
> output. On the other kernels, core's pager prints a message (mostly
> accompanied with something like "invalid signal-context capability").
> 
> I doubt that stack corruptions were the reason for the trouble you
> observed. I can vividly remember nerve-wracking bug hunting sessions
> prior version 10.02 that were caused by stack overflows corrupting
> adjacent memory, but this hasn't been an issue since then.
> 
> > Alternatively, I can imagine a kernel-level (base-hw) approach
> > which uses canaries at the top of each stack. Every time the kernel
> > switches to a user thread, it checks whether the canary is still
> > alive. If not, another thread's stack must have overflowed. Of
> > course, this method is only reliable if we can assume that every
> > memory word on the stack will be initialised (preferably
> > sequentially).   
> 
> Stack canaries are actually a good idea, which we will investigate in
> the near future - but not to counter stack overflows but as a
> protection measure against deliberate stack-smashing attacks.
> 
> Cheers
> Norman
>