Hello
In a recent project (using Genode on top of OKL4 v2.1 on an ARM platform) I ran into the problem that in one protection domain (PD) occasionally the exception Stack_alloc_failed was thrown. Tracing this back it came out that this was a Region_conflict exception thrown by the RM session client code due to the return value given by core's RM session server.
In that particular PD the main() function and the first started thread both created subthreads independently of each other. The exception occurred because an ATTACH command for the same thread stack address area was sent to core twice.
The thread stack allocation is done by the constructor Thread_base::Thread_base() with its 2nd initializer element _context(_alloc_context(stack_size)). The method Thread_base::_alloc_context() handles the address assignment of a thread stack locally within a PD, which is done by the call _context_allocator()->alloc(this). Beginning from virtual address 0x4000.0000 for each new stack a segment of 1 MB is reserved at which end the stack area with the requested size is allocated. To find out whether a stack area is already in use, the so-called Context_allocator provides the method Thread_base::Context_allocator::_is_in_use().
The thread's context is maintained in structure Thread_base::Context, which is written at the top of each allocated stack. The class Thread_base owns the member _context to point to this data. The Context_allocator itself is a static object instantiated on the first call to function _context_allocator(). It provides a chained list (member _threads) of all created threads. The method _is_in_use() walks through the thread list and determines whether any of the existing threads points to the context address of the stack area in question (thread member _context equal to the context address at the top of stack area under consideration) and returns true, if this is the case. The method Thread_base::Context_allocator::alloc() iterates through the stack segments until it finds an unused stack area. It inserts the new thread into the chained list (call _threads.insert(&thread_base->_list_element);) and returns the stack's Context address to the caller Thread_base::_alloc_context().
The problem is that at this point a decision about a new stack allocation is made which is not yet visible in the chained list of threads, because the new thread is already in the list, but its member _context is not yet set. This happens on the return of Thread_base::_alloc_context() to the constructor, but before that an IPC is made to core to register the new stack allocation. On OKL4 each IPC invokes the scheduler, and what happens in the failure case, is that another process is scheduled which starts instantiating a further new thread. In this situation the method Thread_base::Context_allocator::_is_in_use() does not find the stack area of the previously created stack as occupied, and the caller tries to allocate the stack area another time. However, core detects the double allocation and returns a bad result code on the 2nd ATTACH command.
To fix the problem a lock is required to cover the whole sequence from beginning the search for a free stack area up to the assignment of Thread_base::_context. The existing lock Thread_base::Context_allocator::_threads_lock, used in Thread_base::Context_allocator::alloc(), is not sufficient for this purpose. For a quick fix I inserted the following two lines at the beginning of Thread_base::_alloc_context():
static Lock alloc_lock;
Lock::Guard _lock_guard(alloc_lock);
However, this solution is not perfect, since the lock is released before the assignment to Thread_base::_context, which leaves a gap of a few machine instructions, where a preemptive scheduling still could trigger the exception. Practically it worked for my case.
During analyzing this problem I began wondering about the overall design of the thread stack allocation. Obtaining the stack area is done by the call env_context_area_rm_session()->attach_at(ds_cap, attach_addr, ds_size) within the method Thread_base::_alloc_context(). The parameter attach_addr is not the complete stack address base (for instance 0x400f.c000), but the offset to the PD's stack area base address (for instance 0xf.c000). On the other hand the function env_context_area_rm_session() instantiates a PD_wide RM session which attaches the whole address area of 256 MB at 0x4000.0000 for the PD. Doesn't that instruct the pager to provide memory on a page fault of any address between 0x4000.0000 and 0x4fff.ffff, this way making the system unable to detect any stack over- and underflow? Additionally memory mappings are created for the stack offset addresses which are not really used.
Maybe I missed something. If so, please let me know.
Regards
Frank