Genode: Thread Stack Allocation - users

25 Oct 2010


      Hello
In a recent project (using Genode on top of OKL4 v2.1 on an ARM
platform) I ran into the problem that in one protection domain (PD)
occasionally the exception Stack_alloc_failed was thrown. Tracing this
back it came out that this was a Region_conflict exception thrown by the
RM session client code due to the return value given by core's RM
session server.
In that particular PD the main() function and the first started thread
both created subthreads independently of each other. The exception
occurred because an ATTACH command for the same thread stack address
area was sent to core twice.
The thread stack allocation is done by the constructor
Thread_base::Thread_base() with its 2nd initializer element
_context(_alloc_context(stack_size)). The method
Thread_base::_alloc_context() handles the address assignment of a thread
stack locally within a PD, which is done by the call
_context_allocator()->alloc(this). Beginning from virtual address
0x4000.0000 for each new stack a segment of 1 MB is reserved at which
end the stack area with the requested size is allocated. To find out
whether a stack area is already in use, the so-called Context_allocator
provides the method Thread_base::Context_allocator::_is_in_use().
The thread's context is maintained in structure Thread_base::Context,
which is written at the top of each allocated stack. The class
Thread_base owns the member _context to point to this data. The
Context_allocator itself is a static object instantiated on the first
call to function _context_allocator(). It provides a chained list
(member _threads) of all created threads. The method _is_in_use() walks
through the thread list and determines whether any of the existing
threads points to the context address of the stack area in question
(thread member _context equal to the context address at the top of stack
area under consideration) and returns true, if this is the case. The
method Thread_base::Context_allocator::alloc() iterates through the
stack segments until it finds an unused stack area. It inserts the new
thread into the chained list (call
_threads.insert(&thread_base->_list_element);) and returns the stack's
Context address to the caller Thread_base::_alloc_context().
The problem is that at this point a decision about a new stack
allocation is made which is not yet visible in the chained list of
threads, because the new thread is already in the list, but its member
_context is not yet set. This happens on the return of
Thread_base::_alloc_context() to the constructor, but before that an IPC
is made to core to register the new stack allocation. On OKL4 each IPC
invokes the scheduler, and what happens in the failure case, is that
another process is scheduled which starts instantiating a further new
thread. In this situation the method
Thread_base::Context_allocator::_is_in_use() does not find the stack
area of the previously created stack as occupied, and the caller tries
to allocate the stack area another time. However, core detects the
double allocation and returns a bad result code on the 2nd ATTACH
command.
To fix the problem a lock is required to cover the whole sequence from
beginning the search for a free stack area up to the assignment of
Thread_base::_context. The existing lock
Thread_base::Context_allocator::_threads_lock, used in
Thread_base::Context_allocator::alloc(), is not sufficient for this
purpose. For a quick fix I inserted the following two lines at the
beginning of Thread_base::_alloc_context():
    static Lock alloc_lock;
    Lock::Guard _lock_guard(alloc_lock);
However, this solution is not perfect, since the lock is released before
the assignment to Thread_base::_context, which leaves a gap of a few
machine instructions, where a preemptive scheduling still could trigger
the exception. Practically it worked for my case.
During analyzing this problem I began wondering about the overall design
of the thread stack allocation. Obtaining the stack area is done by the
call env_context_area_rm_session()->attach_at(ds_cap, attach_addr,
ds_size) within the method Thread_base::_alloc_context(). The parameter
attach_addr is not the complete stack address base (for instance
0x400f.c000), but the offset to the PD's stack area base address (for
instance 0xf.c000). On the other hand the function
env_context_area_rm_session() instantiates a PD_wide RM session which
attaches the whole address area of 256 MB at 0x4000.0000 for the PD.
Doesn't that instruct the pager to provide memory on a page fault of any
address between 0x4000.0000 and 0x4fff.ffff, this way making the system
unable to detect any stack over- and underflow? Additionally memory
mappings are created for the stack offset addresses which are not really
used.
Maybe I missed something. If so, please let me know.
Regards
Frank