Hi Frank,
In that particular PD the /main()/ function and the first started thread both created subthreads independently of each other. The exception occurred because an /ATTACH/ command for the same thread stack address area was sent to /core/ twice.
thanks a lot for thoroughly analysing and describing the problem. Indeed, this is a race we need to fix. I will take your proposal as a starting point.
During analyzing this problem I began wondering about the overall design of the thread stack allocation. Obtaining the stack area is done by the call /env_context_area_rm_session()->attach_at(ds_cap, attach_addr, ds_size)/ within the method /Thread_base::_alloc_context()/. The parameter /attach_addr/ is not the complete stack address base (for instance 0x400f.c000), but the offset to the PD's stack area base address (for instance 0xf.c000). On the other hand the function /env_context_area_rm_session()/ instantiates a PD_wide RM session which attaches the whole address area of 256 MB at 0x4000.0000 for the PD. Doesn't that instruct the pager to provide memory on a page fault of any address between 0x4000.0000 and 0x4fff.ffff, this way making the system unable to detect any stack over- and underflow? Additionally memory mappings are created for the stack offset addresses which are not really used. Maybe I missed something. If so, please let me know.
What you are seeing is the use of a managed dataspace. The complete thread context area (starting at address 0x40000000) is spanned by a single managed dataspace, which actually is another RM session (let's call it sub rm-ression). A RAM-dataspace (i.e., a thread context including the stack) attached at offset X inside the sub rm-session will appear at 0x40000000 + X in the PD's address space. But the empty parts of the sub rm-session are not populated with actual memory. If a page fault occurs within a managed dataspace, core will traverse into the sub rm-session to find the actual backing store dataspace for the fault offset within the sub rm-session. If there is no dataspace attached at the fault offset within the sub rm-session (e.g., if a stack overflows), core will print an error message and the faulting thread will be put on halt - the same behaviour as with any other unresolved page fault. So the thread context area is a sparsely populated part of PD's address space. By using the managed dataspace, we prevent normal attachments (via env()->rm_session()) from colliding with the context area.
I hope, this explanation clears things up a bit. Thank you again for pointing us to the context allocation problem! :-)
Best regards Norman