Aw: Re: How to switch thread stack between threads?

Thu Jul 15 11:24:22 CEST 2021


> Gesendet: Mittwoch, 14. Juli 2021 um 12:55 Uhr
> Von: "Alexander Tormasov via users" <users at lists.genode.org>
> An: "Genode users mailing list" <users at lists.genode.org>
> Cc: "Alexander Tormasov" <a.tormasov at innopolis.ru>
> Betreff: Re: How to switch thread stack between threads?
>
> > I see that it is possible to have 1 thread with multiple stacks, using alloc_secondary_stack() call.
> > But if I want to switch stack
> > From one OS thread to another one for setcontext()?
> > 
> > Stack do contain hidden data related to Stack structure and to the native UCB (completely opaque). If I try to use it, it try to find stored structure which contains thread reference, which point to old thread. I need to point it to the new thread 
> > 
> > In particular , it interfere with thread local storage TLS data, it use wrong content from old thread because it use stack content for thread identification , and fail…
> > 
> 
> for better understating of technical design problem:
> during the creation of genode thread it is created Thread object here
> base/src/lib/base/thread.cc:201
> Thread::Thread(size_t weight, const char *name, size_t stack_size,
>                Type type, Cpu_session *cpu_session, Affinity::Location affinity)
> :
>     _cpu_session(cpu_session),
>     _affinity(affinity),
>     _trace_control(nullptr),
>     _stack(type == REINITIALIZED_MAIN ?
>            _stack : _alloc_stack(stack_size, name, type == MAIN))
> {
>     _init_platform_thread(weight, type);
> }
> 
> _alloc_stack do allocate stack from pre-defined area in some relatively big chunks, and do store 2 additional data related to thread in allocated stack:
> Stack *
> Thread::_alloc_stack(size_t stack_size, char const *name, bool main_thread)
> {
>     /* allocate stack */
>     Stack *stack = Stack_allocator::stack_allocator().alloc(this, main_thread);
> ...
>     /*
>      * Now the stack is backed by memory, so it is safe to access its members.
>      *
>      * We need to initialize the stack object's memory with zeroes, otherwise
>      * the ds_cap isn't invalid. That would cause trouble when the assignment
>      * operator of Native_capability is used.
>      */
>     construct_at<Stack>(stack, name, *this, ds_addr, ds_cap);
> 
>     Abi::init_stack(stack->top());
>     return stack;
> }
> 
> During construct_at<Stack> constructor it store inside the following:
> 
>         Stack(Name const &name, Thread &thread, addr_t base,
>               Ram_dataspace_capability ds_cap)
>         :
>             _name(name), _thread(thread), _base(base), _ds_cap(ds_cap)
>         { }
> from base/src/include/base/internal/stack.h :
> /*
>  * \brief  Stack layout and organization
>  * \author Norman Feske
>  * \date   2006-04-28
>  *
>  * For storing thread-specific data such as the stack and thread-local data,
>  * there is a dedicated portion of the virtual address space. This portion is
>  * called stack area. Within this area, each thread has
>  * a fixed-sized slot. The layout of each slot looks as follows
>  *
>  * ; lower address
>  * ;   ...
>  * ;   ============================ <- aligned at the slot size
>  * ;
>  * ;             empty
>  * ;
>  * ;   ----------------------------
>  * ;
>  * ;             stack
>  * ;             (top)              <- initial stack pointer
>  * ;   ---------------------------- <- address of 'Stack' object
>  * ;       thread-specific data
>  * ;   ----------------------------
>  * ;              UTCB
>  * ;   ============================ <- aligned at the slot size
>  * ;   ...
>  * ; higher address
>  *
>  * On some platforms, a user-level thread-control block (UTCB) contains
>  * data shared between the user-level thread and the kernel. It is typically
>  * used for transferring IPC message payload or for system-call arguments.
>  * The additional stack members are a reference to the corresponding
>  * 'Thread' object and the name of the thread.
>  *
>  * The stack area is a virtual memory area, initially not backed by real
>  * memory. When a new thread is created, an empty slot gets assigned to the new
>  * thread and populated with memory pages for the stack and thread-specific
>  * data. Note that this memory is allocated from the RAM session of the
>  * component environment and not accounted for when using the 'sizeof()'
>  * operand on a 'Thread' object.
>  *
>  * A thread may be associated with more than one stack. Additional secondary
>  * stacks can be associated with a thread, and used for user level scheduling.
Did you see this ^ ? {g,s}etcontext() count as user level scheduling!
>  */
> 
> to attribute TLS data genode use __emutls_get_address function from repos/base/src/lib/cxx/emutls.cc
> where is use Thread::myself() function to obtain a pointer to Genode::Thread from base/src/lib/base/thread_myself.cc
> which just take address from current stack, round it to find stored hidden Stack structure:
> Genode::Thread *Genode::Thread::myself()
> {
>     int dummy = 0; /* used for determining the stack pointer */
> 
> ...
>     addr_t base = Stack_allocator::addr_to_base(&dummy);
>     return &Stack_allocator::base_to_stack(base)->thread();
> }
> 
> So, If I switch stack to ones which belongs during creation to another OS thread, it will find «foreign» Stack structure with wrong Thread pointer, which is used as a key in the __emutls_get_address, and, therefore, give wrong TLS-related address of variables with TLS attribute (per-OS thread).
> 
> So, to fix it I need during switch of context to non-local thread (setcontext() or even longjump() functions) I should update these data to current running stack.
user level switching is only valid within the same thread. The only way to do this is to do a local
user level switch to a user level thread that immediately blocks the os level thread and wakes the os level
thread, that is blocked in the same procedure and corresponds to the target user level thread. At wakeup that
user level thread, which was blocked at the os level, reads the target user level thread and makes a local
user level switch to it.
The Mutex on which the user level threads blocks (at least its address) needs to be part of the context.
> I see here 3 problems: 
> 1. update of Genode::Stack content - while all necessary fields are declared as private, I could update at least fields _thread _native_thread _utcb via access functions. May be better to memcpy of Stack object content to stack to be switch to?
> 2. update of content of UTCB area as opaque. Not for 100% sure that if I just memcpy its content from current stack before switch it will always works
> 3. locking problem - you have to be sure that your context data structure do not accesses by anyone to override it, and that current stack data do not updated/creared during copy process (eg if it is interrupted by OS thread scheduler).
> 
> Tn theory this could lead to leak of some data structures referenced in the stack (we just override some of the fields with the only references), while this is not clear (if we have more stacks that os threads, and override some of the related per-stack object pointers - can they leak? )
> 
> This also could have bad impact for performance (every setcontext should dig inside stack and copy data in some cases).
> 
> May be more correct solution is not to store association of "OS thread <->  genode thread» inside stack, but to have separate registry where we can use native thread id as a key to find a stack address? this is not as portable as current solution and require implementation (generalisation) of something like thread_id. In such implementation to obtain myself() you can ask current thread_id and obtain related stack address from registry, and use it later (eg to compare with current stack address).
> 
> _______________________________________________
> Genode users mailing list
> users at lists.genode.org
> https://lists.genode.org/listinfo/users