How to switch thread stack between threads?

List overview All Threads
Download

newer

older

link error when trying to run XS...

#genode IRC channel now on...

Alexander Tormasov

14 Jul 2021 14 Jul '21

2:12 a.m.

Hello, I see that it is possible to have 1 thread with multiple stacks, using alloc_secondary_stack() call. But if I want to switch stack From one OS thread to another one for setcontext()?

Stack do contain hidden data related to Stack structure and to the native UCB (completely opaque). If I try to use it, it try to find stored structure which contains thread reference, which point to old thread. I need to point it to the new thread

In particular , it interfere with thread local storage TLS data, it use wrong content from old thread because it use stack content for thread identification , and fail…

How to do this, any ideas? I can’t re-create it - it contains state of the thread , not the best idea to copy content of stack during every context switch …

Alexander

Отправлено с iPhone

Show replies by date

Alexander Tormasov

14 Jul 14 Jul

12:55 p.m.

...

I see that it is possible to have 1 thread with multiple stacks, using alloc_secondary_stack() call. But if I want to switch stack From one OS thread to another one for setcontext()?

Stack do contain hidden data related to Stack structure and to the native UCB (completely opaque). If I try to use it, it try to find stored structure which contains thread reference, which point to old thread. I need to point it to the new thread

In particular , it interfere with thread local storage TLS data, it use wrong content from old thread because it use stack content for thread identification , and fail…

for better understating of technical design problem: during the creation of genode thread it is created Thread object here base/src/lib/base/thread.cc:201 Thread::Thread(size_t weight, const char *name, size_t stack_size, Type type, Cpu_session *cpu_session, Affinity::Location affinity) : _cpu_session(cpu_session), _affinity(affinity), _trace_control(nullptr), _stack(type == REINITIALIZED_MAIN ? _stack : _alloc_stack(stack_size, name, type == MAIN)) { _init_platform_thread(weight, type); }

_alloc_stack do allocate stack from pre-defined area in some relatively big chunks, and do store 2 additional data related to thread in allocated stack: Stack * Thread::_alloc_stack(size_t stack_size, char const *name, bool main_thread) { /* allocate stack */ Stack *stack = Stack_allocator::stack_allocator().alloc(this, main_thread); ... /* * Now the stack is backed by memory, so it is safe to access its members. * * We need to initialize the stack object's memory with zeroes, otherwise * the ds_cap isn't invalid. That would cause trouble when the assignment * operator of Native_capability is used. */ construct_at<Stack>(stack, name, *this, ds_addr, ds_cap);

Abi::init_stack(stack->top()); return stack; }

During construct_at<Stack> constructor it store inside the following:

Stack(Name const &name, Thread &thread, addr_t base, Ram_dataspace_capability ds_cap) : _name(name), _thread(thread), _base(base), _ds_cap(ds_cap) { } from base/src/include/base/internal/stack.h : /* * \brief Stack layout and organization * \author Norman Feske * \date 2006-04-28 * * For storing thread-specific data such as the stack and thread-local data, * there is a dedicated portion of the virtual address space. This portion is * called stack area. Within this area, each thread has * a fixed-sized slot. The layout of each slot looks as follows * * ; lower address * ; ... * ; ============================ <- aligned at the slot size * ; * ; empty * ; * ; ---------------------------- * ; * ; stack * ; (top) <- initial stack pointer * ; ---------------------------- <- address of 'Stack' object * ; thread-specific data * ; ---------------------------- * ; UTCB * ; ============================ <- aligned at the slot size * ; ... * ; higher address * * On some platforms, a user-level thread-control block (UTCB) contains * data shared between the user-level thread and the kernel. It is typically * used for transferring IPC message payload or for system-call arguments. * The additional stack members are a reference to the corresponding * 'Thread' object and the name of the thread. * * The stack area is a virtual memory area, initially not backed by real * memory. When a new thread is created, an empty slot gets assigned to the new * thread and populated with memory pages for the stack and thread-specific * data. Note that this memory is allocated from the RAM session of the * component environment and not accounted for when using the 'sizeof()' * operand on a 'Thread' object. * * A thread may be associated with more than one stack. Additional secondary * stacks can be associated with a thread, and used for user level scheduling. */

to attribute TLS data genode use __emutls_get_address function from repos/base/src/lib/cxx/emutls.cc where is use Thread::myself() function to obtain a pointer to Genode::Thread from base/src/lib/base/thread_myself.cc which just take address from current stack, round it to find stored hidden Stack structure: Genode::Thread *Genode::Thread::myself() { int dummy = 0; /* used for determining the stack pointer */

... addr_t base = Stack_allocator::addr_to_base(&dummy); return &Stack_allocator::base_to_stack(base)->thread(); }

So, If I switch stack to ones which belongs during creation to another OS thread, it will find «foreign» Stack structure with wrong Thread pointer, which is used as a key in the __emutls_get_address, and, therefore, give wrong TLS-related address of variables with TLS attribute (per-OS thread).

So, to fix it I need during switch of context to non-local thread (setcontext() or even longjump() functions) I should update these data to current running stack. I see here 3 problems: 1. update of Genode::Stack content - while all necessary fields are declared as private, I could update at least fields _thread _native_thread _utcb via access functions. May be better to memcpy of Stack object content to stack to be switch to? 2. update of content of UTCB area as opaque. Not for 100% sure that if I just memcpy its content from current stack before switch it will always works 3. locking problem - you have to be sure that your context data structure do not accesses by anyone to override it, and that current stack data do not updated/creared during copy process (eg if it is interrupted by OS thread scheduler).

Tn theory this could lead to leak of some data structures referenced in the stack (we just override some of the fields with the only references), while this is not clear (if we have more stacks that os threads, and override some of the related per-stack object pointers - can they leak? )

This also could have bad impact for performance (every setcontext should dig inside stack and copy data in some cases).

May be more correct solution is not to store association of "OS thread <-> genode thread» inside stack, but to have separate registry where we can use native thread id as a key to find a stack address? this is not as portable as current solution and require implementation (generalisation) of something like thread_id. In such implementation to obtain myself() you can ask current thread_id and obtain related stack address from registry, and use it later (eg to compare with current stack address).

Uwe

15 Jul 15 Jul

11:24 a.m.

New subject: Aw: Re: How to switch thread stack between threads?

...

Gesendet: Mittwoch, 14. Juli 2021 um 12:55 Uhr Von: "Alexander Tormasov via users" users@lists.genode.org An: "Genode users mailing list" users@lists.genode.org Cc: "Alexander Tormasov" a.tormasov@innopolis.ru Betreff: Re: How to switch thread stack between threads?

...
I see that it is possible to have 1 thread with multiple stacks, using alloc_secondary_stack() call. But if I want to switch stack From one OS thread to another one for setcontext()?

Stack do contain hidden data related to Stack structure and to the native UCB (completely opaque). If I try to use it, it try to find stored structure which contains thread reference, which point to old thread. I need to point it to the new thread

In particular , it interfere with thread local storage TLS data, it use wrong content from old thread because it use stack content for thread identification , and fail…

for better understating of technical design problem: during the creation of genode thread it is created Thread object here base/src/lib/base/thread.cc:201 Thread::Thread(size_t weight, const char *name, size_t stack_size, Type type, Cpu_session *cpu_session, Affinity::Location affinity) : _cpu_session(cpu_session), _affinity(affinity), _trace_control(nullptr), _stack(type == REINITIALIZED_MAIN ? _stack : _alloc_stack(stack_size, name, type == MAIN)) { _init_platform_thread(weight, type); }

_alloc_stack do allocate stack from pre-defined area in some relatively big chunks, and do store 2 additional data related to thread in allocated stack: Stack * Thread::_alloc_stack(size_t stack_size, char const *name, bool main_thread) { /* allocate stack */ Stack *stack = Stack_allocator::stack_allocator().alloc(this, main_thread); ... /* * Now the stack is backed by memory, so it is safe to access its members. * * We need to initialize the stack object's memory with zeroes, otherwise * the ds_cap isn't invalid. That would cause trouble when the assignment * operator of Native_capability is used. */ construct_at<Stack>(stack, name, *this, ds_addr, ds_cap);
Abi::init_stack(stack->top());
return stack;
}

During construct_at<Stack> constructor it store inside the following:
    Stack(Name const &name, Thread &thread, addr_t base,
          Ram_dataspace_capability ds_cap)
    :
        _name(name), _thread(thread), _base(base), _ds_cap(ds_cap)
    { }
from base/src/include/base/internal/stack.h : /*

\brief Stack layout and organization

\author Norman Feske

\date 2006-04-28

For storing thread-specific data such as the stack and thread-local data,

there is a dedicated portion of the virtual address space. This portion is

called stack area. Within this area, each thread has

a fixed-sized slot. The layout of each slot looks as follows

; lower address

; ...

; ============================ <- aligned at the slot size

;

; empty

;

; ----------------------------

;

; stack

; (top) <- initial stack pointer

; ---------------------------- <- address of 'Stack' object

; thread-specific data

; ----------------------------

; UTCB

; ============================ <- aligned at the slot size

; ...

; higher address

On some platforms, a user-level thread-control block (UTCB) contains

data shared between the user-level thread and the kernel. It is typically

used for transferring IPC message payload or for system-call arguments.

The additional stack members are a reference to the corresponding

'Thread' object and the name of the thread.

The stack area is a virtual memory area, initially not backed by real

memory. When a new thread is created, an empty slot gets assigned to the new

thread and populated with memory pages for the stack and thread-specific

data. Note that this memory is allocated from the RAM session of the

component environment and not accounted for when using the 'sizeof()'

operand on a 'Thread' object.

A thread may be associated with more than one stack. Additional secondary

stacks can be associated with a thread, and used for user level scheduling.

Did you see this ^ ? {g,s}etcontext() count as user level scheduling!

...

*/

to attribute TLS data genode use __emutls_get_address function from repos/base/src/lib/cxx/emutls.cc where is use Thread::myself() function to obtain a pointer to Genode::Thread from base/src/lib/base/thread_myself.cc which just take address from current stack, round it to find stored hidden Stack structure: Genode::Thread *Genode::Thread::myself() { int dummy = 0; /* used for determining the stack pointer */

... addr_t base = Stack_allocator::addr_to_base(&dummy); return &Stack_allocator::base_to_stack(base)->thread(); }

So, If I switch stack to ones which belongs during creation to another OS thread, it will find «foreign» Stack structure with wrong Thread pointer, which is used as a key in the __emutls_get_address, and, therefore, give wrong TLS-related address of variables with TLS attribute (per-OS thread).

So, to fix it I need during switch of context to non-local thread (setcontext() or even longjump() functions) I should update these data to current running stack.

user level switching is only valid within the same thread. The only way to do this is to do a local user level switch to a user level thread that immediately blocks the os level thread and wakes the os level thread, that is blocked in the same procedure and corresponds to the target user level thread. At wakeup that user level thread, which was blocked at the os level, reads the target user level thread and makes a local user level switch to it. The Mutex on which the user level threads blocks (at least its address) needs to be part of the context.

...

I see here 3 problems:

update of Genode::Stack content - while all necessary fields are declared as private, I could update at least fields _thread _native_thread _utcb via access functions. May be better to memcpy of Stack object content to stack to be switch to?

update of content of UTCB area as opaque. Not for 100% sure that if I just memcpy its content from current stack before switch it will always works

locking problem - you have to be sure that your context data structure do not accesses by anyone to override it, and that current stack data do not updated/creared during copy process (eg if it is interrupted by OS thread scheduler).

Tn theory this could lead to leak of some data structures referenced in the stack (we just override some of the fields with the only references), while this is not clear (if we have more stacks that os threads, and override some of the related per-stack object pointers - can they leak? )

This also could have bad impact for performance (every setcontext should dig inside stack and copy data in some cases).

May be more correct solution is not to store association of "OS thread <-> genode thread» inside stack, but to have separate registry where we can use native thread id as a key to find a stack address? this is not as portable as current solution and require implementation (generalisation) of something like thread_id. In such implementation to obtain myself() you can ask current thread_id and obtain related stack address from registry, and use it later (eg to compare with current stack address).

Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users

Alexander Tormasov

4:49 p.m.

Hello, Uwe

...

...

A thread may be associated with more than one stack. Additional secondary

stacks can be associated with a thread, and used for user level scheduling.

Did you see this ^ ? {g,s}etcontext() count as user level scheduling!

yes, - but probaby we have a different understanding of this term. If we have 2 os threads, running in the user space, - I assume that I can run code in 1 thread and then switch this running code to another thread (second in this example ) and continue execution? like call makecontxt/getcontex from first OS thread and call setcontext with taken data in second OS thread? I do not pretend for anything related to OS/etc, including invocation of them - I just want to be able to continue execution in already started os.

Current golang code support this model in goroutines, and the only problem that it use small number of thread local storage (TLS) variables for operations. I found that if I switch code in the same way as I does in, e.g. Linux, - then in genode it does not work because linux define «myself» using sys calls to obtain threadid (this operation is stack agnostic), while genode hardwire stack to backbend OS thread and do not allow simple setcontext() like switch to another OS (non-user!) thread with own stack, and, therefore, TLS variables became wrong.

...

...
So, to fix it I need during switch of context to non-local thread (setcontext() or even longjump() functions) I should update these data to current running stack.

user level switching is only valid within the same thread. The only way to do this is to do a local user level switch to a user level thread that immediately blocks the os level thread and wakes the os level thread, that is blocked in the same procedure and corresponds to the target user level thread. At wakeup that user level thread, which was blocked at the os level, reads the target user level thread and makes a local user level switch to it. The Mutex on which the user level threads blocks (at least its address) needs to be part of the context.

thank you for proposed solution. I have a question related to it: you assume to start the same function with stack instance or different one? if the same - it will contain correct user state, but incorrect (old) os thread related data (as I have now); if not the same - I need to read the content of old stack/etc, parse it and copy to new stack on new OS thread?

later example is incorrect: if we have a local reference stored inside stack, then we doomed…

this is example of code which will not works:

f(int * p) { *p = 2; getcontext() … here we can appear in old or new threads *p = 3; // here we will point to variable in local stack - should be sure that it is not outside }

g() { int a=0; f(&a); print(a); }

if I call g() and switch inside f() to new thread - then local stack of g() and f() will contain reference &a to variable inside the stack. So, if I just copy stack, run new code and free old stack - it will contain a reference to old stack and "3" will be written not to a variable but to somewhere else.

In golang code typically we first save context in arbitrary os thread using getcontext(), then we will run code which just read saved context and set it for current OS thread. if this is the same OS thread - everything works ok. IF this is another OS thread - we already appears in it and try to just setup RSP register in x86 to point to old stack which attributed with the old OS thread (reference to Thread object and UTCB at least in genode)…

I suppose that the only reasonable straightforward solution here is to copy os thread data (Thread and utcb objects) from new thread where I appears to stack to be setup (taken typically from getcontext/makecontext call). May be by introducing of re_construction function (or method?) to be applied to Stack instance. it definitely contains reference to itself (eg _thread and _utcb pointers) and simple memcpy will not works… main question here is that this approach do require confidence that Thread and UTCB objects do not contains references to fields inside - and for UTCB this is definitely not low level OS agnostic...

More generic solution: could be implementation of kind of registry for associations between OS level thread and genode Thread without stack instances, potentially kind of virtualisation of low lever OS thread id with 1 to 1 translation to genode id. IMHO in general genode good to have virtualisation like namespaces/cgroups in linux or windows. this also simplify checkpoint/restore, migration, fast restart of drivers and core and other related cross-instances operations.

Alexander

Uwe

6:49 p.m.

New subject: Aw: Re: How to switch thread stack between threads?

...

Gesendet: Donnerstag, 15. Juli 2021 um 16:49 Uhr Von: "Alexander Tormasov via users" users@lists.genode.org An: "Genode users mailing list" users@lists.genode.org Cc: "Alexander Tormasov" a.tormasov@innopolis.ru Betreff: Re: How to switch thread stack between threads?

Hello, Uwe

...
...

A thread may be associated with more than one stack. Additional secondary

stacks can be associated with a thread, and used for user level scheduling.

Did you see this ^ ? {g,s}etcontext() count as user level scheduling!

yes, - but probaby we have a different understanding of this term. If we have 2 os threads, running in the user space, - I assume that I can run code in 1 thread and then switch this running code to another thread (second in this example ) and continue execution? like call makecontxt/getcontex from first OS thread and call setcontext with taken data in second OS thread?

You can run code in whatever thread you want. However, the result may not be what you want, but that depends on your code. The model you should think about is the following. A process has many os level threads. Every os level thread has at least one stack. Every stack equals a user level thread. Any user level thread belongs to one and only one os level thread. This is a strict two level tree.

...

I do not pretend for anything related to OS/etc, including invocation of them - I just want to be able to continue execution in already started os.

Current golang code support this model in goroutines, and the only problem that it use small number of thread local storage (TLS) variables for operations. I found that if I switch code in the same way as I does in, e.g. Linux, - then in genode it does not work because linux define «myself» using sys calls to obtain threadid (this operation is stack agnostic), while genode hardwire stack to backbend OS thread and do not allow simple setcontext() like switch to another OS (non-user!) thread with own stack, and, therefore, TLS variables became wrong.

That depends on the implementation of setcontext(). As I said below, there must be a mutex that should be part of the context (and denotes the os level thread the context belongs to). And setcontext() must run two different algorithms, depending on the mutexes that are part of the contexts. If not you will run a stack on the wrong os level thread. With bad consequences.

...

...
...
So, to fix it I need during switch of context to non-local thread (setcontext() or even longjump() functions) I should update these data to current running stack.

user level switching is only valid within the same thread. The only way to do this is to do a local user level switch to a user level thread that immediately blocks the os level thread and wakes the os level thread, that is blocked in the same procedure and corresponds to the target user level thread. At wakeup that user level thread, which was blocked at the os level, reads the target user level thread and makes a local user level switch to it. The Mutex on which the user level threads blocks (at least its address) needs to be part of the context.

thank you for proposed solution. I have a question related to it: you assume to start the same function with stack instance or different one?

Is same or different. Important is os level thread matches user level thread. If same trivially true. If not special sync needed. As described.

...

if the same - it will contain correct user state, but incorrect (old) os thread related data (as I have now); if not the same - I need to read the content of old stack/etc, parse it and copy to new stack on new OS thread?

later example is incorrect: if we have a local reference stored inside stack, then we doomed…

this is example of code which will not works:

f(int * p) { *p = 2; getcontext() … here we can appear in old or new threads *p = 3; // here we will point to variable in local stack - should be sure that it is not outside }

g() { int a=0; f(&a); print(a); }

if I call g() and switch inside f() to new thread - then local stack of g() and f() will contain reference &a to variable inside the stack. So, if I just copy stack, run new code and free old stack - it will contain a reference to old stack and "3" will be written not to a variable but to somewhere else.

In golang code typically we first save context in arbitrary os thread using getcontext(), then we will run code which just read saved context and set it for current OS thread.

In genode you can't migrate user level thread (=context) to other os thread. User level thread implies os level thread. The equivalent you can do is using Duffs Device to record continuations in one os level thread, and replay them in the target thread. Although I would recommend against it.

...

if this is the same OS thread - everything works ok. IF this is another OS thread - we already appears in it and try to just setup RSP register in x86 to point to old stack which attributed with the old OS thread (reference to Thread object and UTCB at least in genode)…

I suppose that the only reasonable straightforward solution here is to copy os thread data (Thread and utcb objects) from new thread where I appears to stack to be setup (taken typically from getcontext/makecontext call). May be by introducing of re_construction function (or method?) to be applied to Stack instance. it definitely contains reference to itself (eg _thread and _utcb pointers) and simple memcpy will not works… main question here is that this approach do require confidence that Thread and UTCB objects do not contains references to fields inside - and for UTCB this is definitely not low level OS agnostic...

More generic solution: could be implementation of kind of registry for associations between OS level thread and genode Thread without stack instances, potentially kind of virtualisation of low lever OS thread id with 1 to 1 translation to genode id. IMHO in general genode good to have virtualisation like namespaces/cgroups in linux or windows. this also simplify checkpoint/restore, migration, fast restart of drivers and core and other related cross-instances operations.

Alexander

Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users

Alexander Tormasov

9:21 p.m.

...

...
So, to fix it I need during switch of context to non-local thread (setcontext() or even longjump() functions) I should update these data to current running stack.

user level switching is only valid within the same thread. The only way to do this is to do a local user level switch to a user level thread that immediately blocks the os level thread and wakes the os level thread, that is blocked in the same procedure and corresponds to the target user level thread. At wakeup that user level thread, which was blocked at the os level, reads the target user level thread and makes a local user level switch to it. The Mutex on which the user level threads blocks (at least its address) needs to be part of the context.

probably I give a wrong picture of operations 1. I run arbitrary function with stack associated with first thread 2. I copy current context using getcontext and store it somewhere 3. I stop doing function from 1, by switching to another function/stack associated with thread 1 4. after some time I create a new os thread and run some code inside it 5. then inside 2 thread I take old context from 2 above and perform setcontext from inside 2 thread to replace current function with state in the thread to the first one

So, I don’t need mutexes and wait for something - I have a time gap between suspend of function in os 1 and it continuation on thread 2. during continuation I already switched to thread 2… just need to associate old stack with it (it contains state of function/stack from 1)

Uwe

16 Jul 16 Jul

11:17 a.m.

New subject: Aw: Re: How to switch thread stack between threads?

...

Gesendet: Donnerstag, 15. Juli 2021 um 21:21 Uhr Von: "Alexander Tormasov via users" users@lists.genode.org An: "Genode users mailing list" users@lists.genode.org Cc: "Alexander Tormasov" a.tormasov@innopolis.ru Betreff: Re: How to switch thread stack between threads?

...
...
So, to fix it I need during switch of context to non-local thread (setcontext() or even longjump() functions) I should update these data to current running stack.

user level switching is only valid within the same thread. The only way to do this is to do a local user level switch to a user level thread that immediately blocks the os level thread and wakes the os level thread, that is blocked in the same procedure and corresponds to the target user level thread. At wakeup that user level thread, which was blocked at the os level, reads the target user level thread and makes a local user level switch to it. The Mutex on which the user level threads blocks (at least its address) needs to be part of the context.

probably I give a wrong picture of operations

I did understand you the first time around. But I have to disappoint you. What you want is IMPOSSIBLE! At least in genode. Because you can't create such a context.

...

I run arbitrary function with stack associated with first thread

Every stack implies (is bound to) an os level thread.

...

I copy current context using getcontext and store it somewhere

You must store a pointer to an os level object that holds the os level thread (mutex is fine) with the context to be able to later resume in the correct context.

...

I stop doing function from 1, by switching to another function/stack associated with thread 1

That is possible.

...

after some time I create a new os thread and run some code inside it

then inside 2 thread I take old context from 2 above and perform setcontext from inside 2 thread to replace current function with state in the thread to the first one

You can not use context from 2 in another thread. Alternatively you can reconstruct the call chain from the first thread in the second thread with Duffs Device (https://en.wikipedia.org/wiki/Duff%27s_device#See_also) And then you can construct a mirror context, which is the first context but in the second thread.

...

So, I don’t need mutexes and wait for something - I have a time gap between suspend of function in os 1 and it continuation on thread 2.

The mutexes have another purpose. They regulate the os level threads when user level threads yield.

...

during continuation I already switched to thread 2… just need to associate old stack with it (it contains state of function/stack from 1)

Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users

Alexander Tormasov

5:40 p.m.

...

...
probably I give a wrong picture of operations

I did understand you the first time around. But I have to disappoint you. What you want is IMPOSSIBLE! At least in genode. Because you can't create such a context.

I do not plan to create it, I plan to update saved by taking some data from running thread. I am also considering re-mapping of a part of stack area. We do know that all aux (os-relates) structures from current and saved context has the same size (in running instance) and mapping offset from start of area, even could be aligned to page bound. so, I can potentially save ucontext, take last stack pointer (e.g RSP register) and re-map OS-related areas to currently running thread… while technically it is similar to just copying it. this is a kind of hack, still not sure that it will work reliably (while it could be definitely limited to combination of utcb and native_thread structure states for some platform, they could be ported on face-by-face)

...

...

I run arbitrary function with stack associated with first thread

Every stack implies (is bound to) an os level thread.

...

I copy current context using getcontext and store it somewhere

You must store a pointer to an os level object that holds the os level thread (mutex is fine) with the context to be able to later resume in the correct context.

as I see, in this moment Stack object do contains 2 data structures - native_thread and utcb handled by native OS (mean updated) as a way to store native os thread data (as well as 3-d "Thread object" reference - this is genode object). how they will be related to proposed mutex?

...

...

I stop doing function from 1, by switching to another function/stack associated with thread 1

That is possible.

...

after some time I create a new os thread and run some code inside it

then inside 2 thread I take old context from 2 above and perform setcontext from inside 2 thread to replace current function with state in the thread to the first one

You can not use context from 2 in another thread. Alternatively you can reconstruct the call chain from the first thread in the second thread with Duffs Device (https://en.wikipedia.org/wiki/Duff%27s_device#See_also) And then you can construct a mirror context, which is the first context but in the second thread.

The problem that model of context is already implemented inside golang runtime (size of ~1m lines of code) using set/get/make context calls. I do not understand how I can emulate them using Duff device co-routines without significant modification (mostly rewriting) of this not-mine code? Even including go compiler: golang use stack variables, and do generate code which handle them (compiler). Duff device approach require different model as I know, nothing should be stored in the stack... Also see below

...

...
So, I don’t need mutexes and wait for something - I have a time gap between suspend of function in os 1 and it continuation on thread 2.

The mutexes have another purpose. They regulate the os level threads when user level threads yield.

golang already does this. as a part of language and runtime, it have a wrappers around sys calls and user-space preemtion points. It aware about existence of OS threads and plurality if goroutines, and remap goroutines in preemption points from one OS thread to another one. so, user lever threads never call kernel directly to yield/etc, this is done on the language/runtime level. see (1) in short, it periodically check possibility of preemption, and always do it during sys call (before and after), and handle blocked threads (AKA M structures) by itself having different queues for global/local instances, idle/blocked state/etc, and manipulate native OS threads (create/block/delete/etc) via pthread or similar libraries. E.g., if you have a blocking sys call from inside goroutine G running on M OS thread, then, before actual call, it «park» M and appropriate G in such a way that it already utilise OS object to wait (e.g. futex on linux) and correctly return/resurrect blocked M after return from syscall, again via user-level scheduler and preemption point (it just activate M and find appropriate goroutine for it to run inside context, may be the same as make syscall and block it, may be another - via user-level scheduler). This again assume mobility to goroutines with related stack between OS threads.

IN the last version of runtime golang developers even implement own simplified «setcontext» in asm without kernel sys calls (typically made for signals processing), - this is a core of all runtime. So, if we want to have Golang running inside genode - we need to find a way to support their model of context switches by emulation of make/set/getcontext semantics. And, it assume that any address could be used as a stack for thread (but genode allow only pre-defined and «chunked» stacks now). This significantly limit the number of potential goroutines co-existing (in real heavy load programms it could be 10-th K of them…).

Thats why I still think that may be it worth to have a bit different way to obtain from genode thread reference to the Thread object by replacing Thread::myself() function (this is just an idea, not sure is it possible to implement it only for particular application).

1 https://medium.com/swlh/different-threading-models-why-i-feel-goroutine-is-b...

Uwe

8:09 p.m.

New subject: Aw: Re: How to switch thread stack between threads?

...

Gesendet: Freitag, 16. Juli 2021 um 17:40 Uhr Von: "Alexander Tormasov via users" users@lists.genode.org An: "Genode users mailing list" users@lists.genode.org Cc: "Alexander Tormasov" a.tormasov@innopolis.ru Betreff: Re: How to switch thread stack between threads?

...
...
probably I give a wrong picture of operations

I did understand you the first time around. But I have to disappoint you. What you want is IMPOSSIBLE! At least in genode. Because you can't create such a context.

I do not plan to create it, I plan to update saved by taking some data from running thread. I am also considering re-mapping of a part of stack area. We do know that all aux (os-relates) structures from current and saved context has the same size (in running instance) and mapping offset from start of area, even could be aligned to page bound.

For that you woud need support from the os. It could be that the UTCB contains capabilities (at least in some constellations) which will be protected (of course). That in turn means you won't get that support. And if in some constellations that support willbe there (because no capabilities have to be protected) that will be seen as a mistake and rescinded as fast as possible.

...

so, I can potentially save ucontext, take last stack pointer (e.g RSP register) and re-map OS-related areas to currently running thread… while technically it is similar to just copying it. this is a kind of hack, still not sure that it will work reliably (while it could be definitely limited to combination of utcb and native_thread structure states for some platform, they could be ported on face-by-face)

It will be at least destroying the mapping from numbers to capabilities.

...

...
...

I run arbitrary function with stack associated with first thread

Every stack implies (is bound to) an os level thread.

...

I copy current context using getcontext and store it somewhere

You must store a pointer to an os level object that holds the os level thread (mutex is fine) with the context to be able to later resume in the correct context.

as I see, in this moment Stack object do contains 2 data structures - native_thread and utcb handled by native OS (mean updated) as a way to store native os thread data (as well as 3-d "Thread object" reference - this is genode object). how they will be related to proposed mutex?

It works like the go thread communication with system calls. The model of user level threads is very near the goroutines. The only difference I see so far is that user level threads are pinned to the os level thread that created them. And this has to do with capabilities. The os level thread contains a registry which maps numbers to capabilities. Because every number in an user level thread can potentially be a valid capability (in which case it must be changed if the os level thread and therefore the registry is changed) or really be a number (in which case it mustn't be changed) it is impossible to decide and therefore the registry has to be locked to the user level thread. But that is in turn only possible at the os level. Therefore user level threads have to be pinned to an os level thread. The only exception I could think of is to shed all capabilities. Even the one neded to access the stack. Which would defeat the purpose.

...

...
...

I stop doing function from 1, by switching to another function/stack associated with thread 1

That is possible.

...

after some time I create a new os thread and run some code inside it

then inside 2 thread I take old context from 2 above and perform setcontext from inside 2 thread to replace current function with state in the thread to the first one

You can not use context from 2 in another thread. Alternatively you can reconstruct the call chain from the first thread in the second thread with Duffs Device (https://en.wikipedia.org/wiki/Duff%27s_device#See_also) And then you can construct a mirror context, which is the first context but in the second thread.

The problem that model of context is already implemented inside golang runtime (size of ~1m lines of code) using set/get/make context calls. I do not understand how I can emulate them using Duff device co-routines without significant modification (mostly rewriting) of this not-mine code? Even including go compiler: golang use stack variables, and do generate code which handle them (compiler). Duff device approach require different model as I know, nothing should be stored in the stack... Also see below

...
...
So, I don’t need mutexes and wait for something - I have a time gap between suspend of function in os 1 and it continuation on thread 2.

The mutexes have another purpose. They regulate the os level threads when user level threads yield.

golang already does this. as a part of language and runtime, it have a wrappers around sys calls and user-space preemtion points.

And around {g,s}etcontext() too? If no, you would have to write it in. If yes these mutexes have to become (pointer is enough)part of the context.

...

It aware about existence of OS threads and plurality if goroutines, and remap goroutines in preemption points from one OS thread to another one. so, user lever threads never call kernel directly to yield/etc, this is done on the language/runtime level. see (1) in short, it periodically check possibility of preemption, and always do it during sys call (before and after), and handle blocked threads (AKA M structures) by itself having different queues for global/local instances, idle/blocked state/etc, and manipulate native OS threads (create/block/delete/etc) via pthread or similar libraries. E.g., if you have a blocking sys call from inside goroutine G running on M OS thread, then, before actual call, it «park» M and appropriate G in such a way that it already utilise OS object to wait (e.g. futex on linux) and correctly return/resurrect blocked M after return from syscall, again via user-level scheduler and preemption point (it just activate M and find appropriate goroutine for it to run inside context, may be the same as make syscall and block it, may be another - via user-level scheduler). This again assume mobility to goroutines with related stack between OS threads.

IN the last version of runtime golang developers even implement own simplified «setcontext» in asm without kernel sys calls (typically made for signals processing), - this is a core of all runtime. So, if we want to have Golang running inside genode - we need to find a way to support their model of context switches by emulation of make/set/getcontext semantics.

In principle they are close enough. But migrating goroutines to another os level thread has to stop. Or the goroutines are interpreted (at least at preemption points) and the user level threads all(!) run the same VM that does the interpreting.

...

And, it assume that any address could be used as a stack for thread (but genode allow only pre-defined and «chunked» stacks now). This significantly limit the number of potential goroutines co-existing (in real heavy load programms it could be 10-th K of them…).

Not reallly if you use the interpreting model. That would implement a third level. Many goroutines per user level thread. Many user level threads per os level thread. Many os level threads per process. For this you only have to write an API that redirects all os level thread manipulation that the go runtime needs to the level of user level threads. And implement trampoline user level threads to allow migration of goroutines between user level threads.

...

Thats why I still think that may be it worth to have a bit different way to obtain from genode thread reference to the Thread object by replacing Thread::myself() function (this is just an idea, not sure is it possible to implement it only for particular application).

1 https://medium.com/swlh/different-threading-models-why-i-feel-goroutine-is-b...

Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users

Alexander Tormasov

18 Jul 18 Jul

12:13 a.m.

...

...
...
...

I run arbitrary function with stack associated with first thread

Every stack implies (is bound to) an os level thread.

...

I copy current context using getcontext and store it somewhere

You must store a pointer to an os level object that holds the os level thread (mutex is fine) with the context to be able to later resume in the correct context.

as I see, in this moment Stack object do contains 2 data structures - native_thread and utcb handled by native OS (mean updated) as a way to store native os thread data (as well as 3-d "Thread object" reference - this is genode object). how they will be related to proposed mutex?

It works like the go thread communication with system calls. The model of user level threads is very near the goroutines. The only difference I see so far is that user level threads are pinned to the os level thread that created them. And this has to do with capabilities. The os level thread contains a registry which maps numbers to capabilities. Because every number in an user level thread can potentially be a valid capability (in which case it must be changed if the os level thread and therefore the registry is changed) or really be a number (in which case it mustn't be changed) it is impossible to decide and therefore the registry has to be locked to the user level thread. But that is in turn only possible at the os level. Therefore user level threads have to be pinned to an os level thread. The only exception I could think of is to shed all capabilities. Even the one neded to access the stack. Which would defeat the purpose.

Pthread model below golang runtime which is based on genode port of libc has a bit different model (at least emulated it). It assumes common space for created threads (which as mapped, correct me if I am wrong, to single OS thread in 1<->1 mode), and, at east, for common memory space and some subset of capabilities related to os resources, shared between threads. E.g. I assume that file, opened in one thread, I can use in another thread - therefore, related capability is common between them - it created from single session. So, numerical translation of number <-> capability , as you mention above, do not broken. in general, I think that and of analogy between capability and pointer to memory and handles/descriptors in traditional OS is correct. Golang do not know about capabilities, everything hidden inside translation mechanism from kernel to user-space and back in the same way as it is in standard OS I do not know kernel address of related to fd structure.

Talking about golang model: they use common space for memory and descriptors and utilise as minimum os thread as possible. typically number of os threads equal number of processor cores, and it growth only in situation when thread blocked by syscall. in such case it stopped and new thread take from idle list or created from the scratch). Reason for goroutine migration between os threads is a performance, we need just to keep core busy by something. e.g. migration of code between different process in linux/windows require virtualisation of descriptors/handles because the same value of fd mean different opened files in different processes/domains.

Talking about optimal model in genode, I think, if we can group a set of threads in the same way as we does with pthread model (making kind of «domain of threads with shared resources» as derivative from main one), then we can migrate code inside this «domain». This could be analog of «process with a set of os threads sharing same objects/memory space» like in linux/windows/other OS. This approach can give us kind of «naturally limited» capability/allowance to move code only between involved OS threads, not to any other ones where real value of numbers could be the same.

...

...
golang already does this. as a part of language and runtime, it have a wrappers around sys calls and user-space preemtion points.

And around {g,s}etcontext() too? If no, you would have to write it in. If yes these mutexes have to become (pointer is enough)part of the context.

yes, places where they called limited and clear, in C part of runtime I can keep pointers to anything, e.g. Thread object/utcb/native thread or even create mutex - inside context if need . the only question is how to use it exactly?

they are taken in some context (one os thread) and when code run in another thread, should I just lock this mutex from old context? I need this, not old is thread to run… so, as I understand, 1. I run os thread and inside create a mutex 2. inside 1 thread I rub getcontext and also store inside mutex reference 3. I do create os thread 2 4. inside code run in os thread 2 I do read saved in 2 context and wait for mutex using reference (?) 5. after return from mutex wait inside 2 os thread context I call setcontext and continue execution of goroutine in os thread 2

in 4 above I should not switch to original os thread 1 - it is busy for some other goroutines...

...

...
IN the last version of runtime golang developers even implement own simplified «setcontext» in asm without kernel sys calls (typically made for signals processing), - this is a core of all runtime. So, if we want to have Golang running inside genode - we need to find a way to support their model of context switches by emulation of make/set/getcontext semantics.

In principle they are close enough. But migrating goroutines to another os level thread has to stop. Or the goroutines are interpreted (at least at preemption points) and the user level threads all(!) run the same VM that does the interpreting.

imho this is too heavyweight solution, and, moreover, golang do not have language VM and interpreter...

...

...
And, it assume that any address could be used as a stack for thread (but genode allow only pre-defined and «chunked» stacks now). This significantly limit the number of potential goroutines co-existing (in real heavy load programms it could be 10-th K of them…).

Not reallly if you use the interpreting model. That would implement a third level. Many goroutines per user level thread. Many user level threads per os level thread. Many os level threads per process. For this you only have to write an API that redirects all os level thread manipulation that the go runtime needs to the level of user level threads. And implement trampoline user level threads to allow migration of goroutines between user level threads.

in such approach seems that we will utilise the only thread - while the main reason for existence of multiply threads is to run single thread per cpu, with switchable goroutines till any of them will be blocked.

Uwe

10:23 p.m.

New subject: Aw: Re: How to switch thread stack between threads?

...

Gesendet: Sonntag, 18. Juli 2021 um 00:13 Uhr Von: "Alexander Tormasov via users" users@lists.genode.org An: "Genode users mailing list" users@lists.genode.org Cc: "Alexander Tormasov" a.tormasov@innopolis.ru Betreff: Re: How to switch thread stack between threads?

...
...
...
...

I run arbitrary function with stack associated with first thread

Every stack implies (is bound to) an os level thread.

...

I copy current context using getcontext and store it somewhere

You must store a pointer to an os level object that holds the os level thread (mutex is fine) with the context to be able to later resume in the correct context.

as I see, in this moment Stack object do contains 2 data structures - native_thread and utcb handled by native OS (mean updated) as a way to store native os thread data (as well as 3-d "Thread object" reference - this is genode object). how they will be related to proposed mutex?

It works like the go thread communication with system calls. The model of user level threads is very near the goroutines. The only difference I see so far is that user level threads are pinned to the os level thread that created them. And this has to do with capabilities. The os level thread contains a registry which maps numbers to capabilities. Because every number in an user level thread can potentially be a valid capability (in which case it must be changed if the os level thread and therefore the registry is changed) or really be a number (in which case it mustn't be changed) it is impossible to decide and therefore the registry has to be locked to the user level thread. But that is in turn only possible at the os level. Therefore user level threads have to be pinned to an os level thread. The only exception I could think of is to shed all capabilities. Even the one neded to access the stack. Which would defeat the purpose.

Pthread model below golang runtime which is based on genode port of libc has a bit different model (at least emulated it). It assumes common space for created threads (which as mapped, correct me if I am wrong, to single OS thread in 1<->1 mode), and, at east, for common memory space and some subset of capabilities related to os resources, shared between threads. E.g. I assume that file, opened in one thread, I can use in another thread - therefore, related capability is common between them - it created from single session.

I think that works with TLS. At the user level. Another indirection on top of the native numbers. I think in the POSIX Library.

...

So, numerical translation of number <-> capability , as you mention above, do not broken. in general, I think that and of analogy between capability and pointer to memory and handles/descriptors in traditional OS is correct. Golang do not know about capabilities, everything hidden inside translation mechanism from kernel to user-space and back in the same way as it is in standard OS I do not know kernel address of related to fd structure.

Talking about golang model: they use common space for memory and descriptors and utilise as minimum os thread as possible. typically number of os threads equal number of processor cores, and it growth only in situation when thread blocked by syscall. in such case it stopped and new thread take from idle list or created from the scratch).

I did read your link. And if you disable the global queue it may be possible that go never executes code that is invalid on genode although it is included. And therefore not reliable. And that could be an issue with the philosophy of genode.

...

Reason for goroutine migration between os threads is a performance, we need just to keep core busy by something. e.g. migration of code between different process in linux/windows require virtualisation of descriptors/handles because the same value of fd mean different opened files in different processes/domains.

Talking about optimal model in genode, I think, if we can group a set of threads in the same way as we does with pthread model (making kind of «domain of threads with shared resources» as derivative from main one), then we can migrate code inside this «domain».

This «domain» is the group of user level threads pinned to one os level thread.

...

This could be analog of «process with a set of os threads sharing same objects/memory space» like in linux/windows/other OS. This approach can give us kind of «naturally limited» capability/allowance to move code only between involved OS threads, not to any other ones where real value of numbers could be the same.

...
...
golang already does this. as a part of language and runtime, it have a wrappers around sys calls and user-space preemtion points.

And around {g,s}etcontext() too? If no, you would have to write it in. If yes these mutexes have to become (pointer is enough)part of the context.

yes, places where they called limited and clear, in C part of runtime I can keep pointers to anything, e.g. Thread object/utcb/native thread or even create mutex - inside context if need . the only question is how to use it exactly?

they are taken in some context (one os thread) and when code run in another thread, should I just lock this mutex from old context? I need this, not old is thread to run… so, as I understand,

I run os thread and inside create a mutex

inside 1 thread I rub getcontext and also store inside mutex reference

I do create os thread 2

inside code run in os thread 2 I do read saved in 2 context and wait for mutex using reference (?)

after return from mutex wait inside 2 os thread context I call setcontext and continue execution of goroutine in os thread 2

in 4 above I should not switch to original os thread 1 - it is busy for some other goroutines...

No, you *can* switch to the original os level thread *but* not to the original user level thread in the os level thread or rather you first message the running original thread to do a user level switch to a dedicated user level thread which is there to wait on the mutex.

...

...
...
IN the last version of runtime golang developers even implement own simplified «setcontext» in asm without kernel sys calls (typically made for signals processing), - this is a core of all runtime. So, if we want to have Golang running inside genode - we need to find a way to support their model of context switches by emulation of make/set/getcontext semantics.

In principle they are close enough. But migrating goroutines to another os level thread has to stop. Or the goroutines are interpreted (at least at preemption points) and the user level threads all(!) run the same VM that does the interpreting.

imho this is too heavyweight solution, and, moreover, golang do not have language VM and interpreter...

Not really. *But only if* you think the VM inside-out. All compiled code between preemption points is a *primitive* in this inside-out VM. And the instructions for the VM describe only the pattern for thread switches.

...

...
...
And, it assume that any address could be used as a stack for thread (but genode allow only pre-defined and «chunked» stacks now). This significantly limit the number of potential goroutines co-existing (in real heavy load programms it could be 10-th K of them…).

Not reallly if you use the interpreting model. That would implement a third level. Many goroutines per user level thread. Many user level threads per os level thread. Many os level threads per process. For this you only have to write an API that redirects all os level thread manipulation that the go runtime needs to the level of user level threads. And implement trampoline user level threads to allow migration of goroutines between user level threads.

in such approach seems that we will utilise the only thread - while the main reason for existence of multiply threads is to run single thread per cpu, with switchable goroutines till any of them will be blocked.

No, the idea is to create 2 os level threads per CPU core. From the start of the program. Independent of code. Half of them run goroutines. The other half run the system calls of these goroutines. The needed task switches should be obvious.

...

Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users

Alexander Tormasov

19 Jul 19 Jul

10 p.m.

...

...
Pthread model below golang runtime which is based on genode port of libc has a bit different model (at least emulated it). It assumes common space for created threads (which as mapped, correct me if I am wrong, to single OS thread in 1<->1 mode), and, at east, for common memory space and some subset of capabilities related to os resources, shared between threads. E.g. I assume that file, opened in one thread, I can use in another thread - therefore, related capability is common between them - it created from single session.

I think that works with TLS. At the user level. Another indirection on top of the native numbers. I think in the POSIX Library.

This is suffucient for my purpose. I do not use direct services except memory manipulation. the only problem with TLS that in current model is it bounded to stack virtual address while I need it to be bounded to my threads...

...

...
Talking about golang model: they use common space for memory and descriptors and utilise as minimum os thread as possible. typically number of os threads equal number of processor cores, and it growth only in situation when thread blocked by syscall. in such case it stopped and new thread take from idle list or created from the scratch).

I did read your link. And if you disable the global queue it may be possible that go never executes code that is invalid on genode although it is included. And therefore not reliable. And that could be an issue with the philosophy of genode.

I don’t think that this will work, global queue is a kind of natural load balancer and a way to continue execution if one cpu and related os thread do block… as a palliative solution we can have per-cpu «global» queues. not sure that it will work without significant scheduler code update

...

...
Talking about optimal model in genode, I think, if we can group a set of threads in the same way as we does with pthread model (making kind of «domain of threads with shared resources» as derivative from main one), then we can migrate code inside this «domain».

This «domain» is the group of user level threads pinned to one os level thread.

is this is, as mentioned above in relation of POSIX subsystem emulation, then it will work even for different ones. all capabilities hidden inside it’s implementation I suppose that on genode with posix thread emulation I could write the following code: 0 run main os thread in standard way for libc+posix 1. create mutex/blocade/any sync primitive and store it in main thread - os thread 0 2. create pthread 1 with func1 (where os thread will be below posix thread 1) 3. in func1 set primitive as busy 4. open file and store file descriptor in common memory of main thread 0, and set sync primitive free 5. in main thread create pthread 2 with func2 where it will be another os thread 2 6. in func2 wait for sync primitive set in pthread1, and after it became free read data from the descriptor from 4 above

this code should work! but below fd it should be uniq file descriptor mapped somehow to capability - but it is created in thread 1 and used in thread 2 - «semi migrated» or shared (both will be os thread). No need to interpret fd or doing something else (like with memory availability) - everything already done in posix+libc ontop of genode!

I want to have the same for golang, IMHO no reasons to make it much more complex… Not everything will work - anyway, basics could be enough as a first step.

...

...
in 4 above I should not switch to original os thread 1 - it is busy for some other goroutines...

No, you *can* switch to the original os level thread *but* not to the original user level thread in the os level thread or rather you first message the running original thread to do a user level switch to a dedicated user level thread which is there to wait on the mutex.

...
imho this is too heavyweight solution, and, moreover, golang do not have language VM and interpreter...

Not really. *But only if* you think the VM inside-out. All compiled code between preemption points is a *primitive* in this inside-out VM. And the instructions for the VM describe only the pattern for thread switches.

this assume possibility to move user-thread only goroutines between CPUs (as mentioned below represented by 2 thread - for syscall and for user being preemptively scheduled by genode itself?). so, can I use for such approach arbitrary stack not taken from alloc_secondary_stack()? and, if I do really want to know on which cpu/thread I run - how this can be done (e.g. for TLS operations)?

...

...
in such approach seems that we will utilise the only thread - while the main reason for existence of multiply threads is to run single thread per cpu, with switchable goroutines till any of them will be blocked.

No, the idea is to create 2 os level threads per CPU core. From the start of the program. Independent of code. Half of them run goroutines. The other half run the system calls of these goroutines. The needed task switches should be obvious.

and, what happens if thread for parcilular CPU responsible for sys calls, really blocked due to genode call (e.g. block on genode mutex, or wait for responce because syscall read() should provide data from disk)? os thread can’t continue execution and we need to create another one (like it implemented in golang runtime)...

Uwe

21 Jul 21 Jul

2:49 p.m.

New subject: Aw: Re: How to switch thread stack between threads?

...

Gesendet: Montag, 19. Juli 2021 um 22:00 Uhr Von: "Alexander Tormasov via users" users@lists.genode.org An: "Genode users mailing list" users@lists.genode.org Cc: "Alexander Tormasov" a.tormasov@innopolis.ru Betreff: Re: How to switch thread stack between threads?

...
...
Pthread model below golang runtime which is based on genode port of libc has a bit different model (at least emulated it). It assumes common space for created threads (which as mapped, correct me if I am wrong, to single OS thread in 1<->1 mode), and, at east, for common memory space and some subset of capabilities related to os resources, shared between threads. E.g. I assume that file, opened in one thread, I can use in another thread - therefore, related capability is common between them - it created from single session.

I think that works with TLS. At the user level. Another indirection on top of the native numbers. I think in the POSIX Library.

This is suffucient for my purpose. I do not use direct services except memory manipulation. the only problem with TLS that in current model is it bounded to stack virtual address while I need it to be bounded to my threads...

...
...
Talking about golang model: they use common space for memory and descriptors and utilise as minimum os thread as possible. typically number of os threads equal number of processor cores, and it growth only in situation when thread blocked by syscall. in such case it stopped and new thread take from idle list or created from the scratch).

I did read your link. And if you disable the global queue it may be possible that go never executes code that is invalid on genode although it is included. And therefore not reliable. And that could be an issue with the philosophy of genode.

I don’t think that this will work, global queue is a kind of natural load balancer and a way to continue execution if one cpu and related os thread do block… as a palliative solution we can have per-cpu «global» queues. not sure that it will work without significant scheduler code update

Yes, I did understand that. But this loadbalancing doesn't work on genode. And if you can run your program at all, although not efficiently, would you choose that?

...

...
...
Talking about optimal model in genode, I think, if we can group a set of threads in the same way as we does with pthread model (making kind of «domain of threads with shared resources» as derivative from main one), then we can migrate code inside this «domain».

This «domain» is the group of user level threads pinned to one os level thread.

is this is, as mentioned above in relation of POSIX subsystem emulation, then it will work even for different ones. all capabilities hidden inside it’s implementation I suppose that on genode with posix thread emulation I could write the following code: 0 run main os thread in standard way for libc+posix

create mutex/blocade/any sync primitive and store it in main thread - os thread 0

create pthread 1 with func1 (where os thread will be below posix thread 1)

in func1 set primitive as busy

open file and store file descriptor in common memory of main thread 0, and set sync primitive free

in main thread create pthread 2 with func2 where it will be another os thread 2

in func2 wait for sync primitive set in pthread1, and after it became free read data from the descriptor from 4 above

this code should work! but below fd it should be uniq file descriptor mapped somehow to capability - but it is created in thread 1 and used in thread 2 - «semi migrated» or shared (both will be os thread). No need to interpret fd or doing something else (like with memory availability) - everything already done in posix+libc ontop of genode!

I want to have the same for golang, IMHO no reasons to make it much more complex… Not everything will work - anyway, basics could be enough as a first step.

...
...
in 4 above I should not switch to original os thread 1 - it is busy for some other goroutines...

No, you *can* switch to the original os level thread *but* not to the original user level thread in the os level thread or rather you first message the running original thread to do a user level switch to a dedicated user level thread which is there to wait on the mutex.

...
imho this is too heavyweight solution, and, moreover, golang do not have language VM and interpreter...

Not really. *But only if* you think the VM inside-out. All compiled code between preemption points is a *primitive* in this inside-out VM. And the instructions for the VM describe only the pattern for thread switches.

this assume possibility to move user-thread only goroutines between CPUs (as mentioned below represented by 2 thread - for syscall and for user being preemptively scheduled by genode itself?). so, can I use for such approach arbitrary stack not taken from alloc_secondary_stack()?

No, the thread stack must come from alloc_secondary_stack()! But nothing compels you to use the thread stack for storing local state or return address. You can do at entry something like type some_func(){ typedef struct locals{ ...; } local; local* l=malloc(sizeof(struct locals)); frame*f=save_return_to_heap(&l); ... } And on return restore_return(f);return(...); This moves all data to the heap while the function is running and restores the stack on return. The functions restore_return() and save_return_to_heap() have to be written in assembler.

...

and, if I do really want to know on which cpu/thread I run - how this can be done (e.g. for TLS operations)?

...
...
in such approach seems that we will utilise the only thread - while the main reason for existence of multiply threads is to run single thread per cpu, with switchable goroutines till any of them will be blocked.

No, the idea is to create 2 os level threads per CPU core. From the start of the program. Independent of code. Half of them run goroutines. The other half run the system calls of these goroutines. The needed task switches should be obvious.

and, what happens if thread for parcilular CPU responsible for sys calls, really blocked due to genode call (e.g. block on genode mutex, or wait for responce because syscall read() should provide data from disk)? os thread can’t continue execution and we need to create another one (like it implemented in golang runtime)...

There are only 2 syscalls that make a thread really wait. All others can be emulated by these 2. These 2 are select() and wait(). (I mean the ones expected by go, not the ones provided by genode) Both allow modification of parameters to make them query instead of wait. They can store their parameters in a list and use a hook in the preemption points of the go language. This hook yields to the syscall thread which yields back if the parameter list is empty. If the parameter list is not empty the parameters are restored from the list and the system is queried. On failure the thread is yielded back as in the case of an empty list. On success the waiting thread (stored in the parameter list) is unblocked with the guarantee that the intended syscall won't block. Whenever that thread runs it yields to the syscall thread with the intended syscall as message which is performed without undue delay and yielded back with the results.

...

Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users

1449

Age (days ago)

1456

Last active (days ago)

users@lists.genode.org

12 comments

2 participants

tags (0)

participants (2)

Alexander Tormasov
Uwe