Hi Christian,
thank you for your input and clarifications. Let me sum up my experience of the debugging hell of the last few days.
Regarding your questions:
- Does the thread 13 really spin in the spinlock or does it reenter the spinlcok aquisition quite rapidly?
You were right to question about whether thread 13 spins around the spinlock. It looks like I took a misleading snapshot.
- Could this be data curruption due to invalid pointers or life-time issues?
I doubt that this is due to a gRPC bug. I created a simplified scenario [1] without gRPC where I could reproduce the/a deadlock (more information below).
- Does the C++ runtime support catching exceptions in Genode::Mutex while unwinding the occurence of another exception (_Unwind_Find_FDE)?
That's a good question. I tried to verify this by starting several pthreads that try to acquire the same mutex with following code snipped.
```c++ try { log("waiting for mutex"); Mutex::Guard guard(mutex()); log("mutex aquired. sleeping."); usleep(2000*1000); log("throw exception"); throw -1; } catch (...) { log("caught exception"); } ```
This does not yield any deadlock. After examining the problem for the last days, I'm not so sure anymore if throwing of an exception has any influence on the deadlock at all.
It is quite a challenge to really find out what the root problem of this is. Surely, it does not help if there is a huge library like gRPC involved. To eliminate any influence of gRPC, I created a scenario [1] that tries to boil down the problem using a combination of `pthread`, `lwip` and dynamic heap allocation with `std::string`. The scenario has one `_server_thread` that listens for new tcp connections and creates a connection thread for each accepted connection. The connection thread then allocates a `std::string` with 50k data, appends whatever was received and is then sent back. After 10 incoming sessions the threads that are done are cleaned up with `pthread_join()` from within the `_server_thread`. To stress this server, I use the same gRPC stress client [2] described in my last mail, since the data transferred shouldn't be relevant anyways.
After less than 100 connections this will result in a deadlock, where all the threads seem to be at a consistant state:
```gdb
(gdb) info threads Id Target Id Frame * 1 LWP 76221 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 2 LWP 76227 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 3 LWP 76231 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 4 LWP 76239 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 5 LWP 76736 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 6 LWP 76742 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 7 LWP 76748 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 8 LWP 76749 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 9 LWP 76755 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 10 LWP 76761 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 11 LWP 76762 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 12 LWP 76763 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 (gdb) thread apply all bt
Thread 12 (LWP 76763): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x40bfeb08, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x40bfeb08) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x000000000af520d6 in Genode::Lock::lock (this=this@entry=0x40bfeb08) at /home/user/genode/repos/base/include/base/lock.h:33 #6 0x000000000af60087 in Libc::Pthread_pool::suspend_myself (timeout_ms=<optimized out>, check=..., this=0xaff8878 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+16112>) at /home/user/genode/repos/libports/src/lib/libc/internal/pthread_pool.h:89 #7 Libc::Kernel::suspend (this=0xaff4988 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space>, check=..., timeout_ms=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/internal/kernel.h:506 #8 0x000000000af8477e in Libc::Vfs_plugin::read (this=0xaff6dc0 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+9272>, fd=<optimized out>, buf=0x40bfedd0, count=4096) at /home/user/genode/repos/libports/src/lib/libc/vfs_plugin.cc:885 #9 0x000000000af74178 in do_recvfrom (fd=<optimized out>, buf=0x40bfedd0, len=4096, src_addr=<optimized out>, src_addrlen=0x0, flags=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:756 #10 0x000000000af7424b in Libc::Socket_fs::Plugin::read (this=<optimized out>, fd=<optimized out>, buf=<optimized out>, count=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:1086 #11 0x0000000001000709 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a240) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:65 #12 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5dfe8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #13 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #14 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 11 (LWP 76762): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x40afeb08, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x40afeb08) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x000000000af520d6 in Genode::Lock::lock (this=this@entry=0x40afeb08) at /home/user/genode/repos/base/include/base/lock.h:33 #6 0x000000000af60087 in Libc::Pthread_pool::suspend_myself (timeout_ms=<optimized out>, check=..., this=0xaff8878 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+16112>) at /home/user/genode/repos/libports/src/lib/libc/internal/pthread_pool.h:89 #7 Libc::Kernel::suspend (this=0xaff4988 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space>, check=..., timeout_ms=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/internal/kernel.h:506 #8 0x000000000af8477e in Libc::Vfs_plugin::read (this=0xaff6dc0 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+9272>, fd=<optimized out>, buf=0x40afedd0, count=4096) at /home/user/genode/repos/libports/src/lib/libc/vfs_plugin.cc:885 #9 0x000000000af74178 in do_recvfrom (fd=<optimized out>, buf=0x40afedd0, len=4096, src_addr=<optimized out>, src_addrlen=0x0, flags=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:756 #10 0x000000000af7424b in Libc::Socket_fs::Plugin::read (this=<optimized out>, fd=<optimized out>, buf=<optimized out>, count=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:1086 #11 0x0000000001000709 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a1f0) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:65 #12 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5dde8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #13 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #14 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 10 (LWP 76761): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x409feb08, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x409feb08) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x000000000af520d6 in Genode::Lock::lock (this=this@entry=0x409feb08) at /home/user/genode/repos/base/include/base/lock.h:33 #6 0x000000000af60087 in Libc::Pthread_pool::suspend_myself (timeout_ms=<optimized out>, check=..., this=0xaff8878 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+16112>) at /home/user/genode/repos/libports/src/lib/libc/internal/pthread_pool.h:89 #7 Libc::Kernel::suspend (this=0xaff4988 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space>, check=..., timeout_ms=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/internal/kernel.h:506 #8 0x000000000af8477e in Libc::Vfs_plugin::read (this=0xaff6dc0 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+9272>, fd=<optimized out>, buf=0x409fedd0, count=4096) at /home/user/genode/repos/libports/src/lib/libc/vfs_plugin.cc:885 #9 0x000000000af74178 in do_recvfrom (fd=<optimized out>, buf=0x409fedd0, len=4096, src_addr=<optimized out>, src_addrlen=0x0, flags=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:756 #10 0x000000000af7424b in Libc::Socket_fs::Plugin::read (this=<optimized out>, fd=<optimized out>, buf=<optimized out>, count=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:1086 #11 0x0000000001000709 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a1b0) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:65 #12 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5dbd8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #13 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #14 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 9 (LWP 76755): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x408fecd0, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x408fecd0) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x00000000500753f0 in Genode::Lock::lock (this=<optimized out>) at /home/user/genode/repos/base/include/base/lock.h:33 #6 Genode::Blockade::block (this=<optimized out>) at /home/user/genode/repos/base/include/base/blockade.h:31 #7 Genode::sleep_forever () at /home/user/genode/repos/base/src/lib/base/sleep.cc:21 #8 0x000000000af66f72 in pthread_exit (value_ptr=value_ptr@entry=0x0) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:558 #9 0x00000000010007d7 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a160) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:91 #10 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5d9d8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #11 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #12 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 8 (LWP 76749): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x407fe0a8, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x407fe0a8) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x000000000af520d6 in Genode::Lock::lock (this=this@entry=0x407fe0a8) at /home/user/genode/repos/base/include/base/lock.h:33 #6 0x000000000af60087 in Libc::Pthread_pool::suspend_myself (timeout_ms=<optimized out>, check=..., this=0xaff8878 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+16112>) at /home/user/genode/repos/libports/src/lib/libc/internal/pthread_pool.h:89 #7 Libc::Kernel::suspend (this=0xaff4988 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space>, check=..., timeout_ms=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/internal/kernel.h:506 #8 0x000000000af83346 in Libc::Vfs_plugin::write (this=0xaff6dc0 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+9272>, fd=0x7ffff7feaa70, buf=<optimized out>, count=50099) at /home/user/genode/repos/libports/src/lib/libc/vfs_plugin.cc:785 #9 0x000000000af71e27 in do_sendto (fd=<optimized out>, buf=0x7ffff7f2c010, len=50099, dest_addr=<optimized out>, dest_addrlen=<optimized out>, flags=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:176 #10 0x000000000af73668 in Libc::Socket_fs::Plugin::write (this=<optimized out>, fd=<optimized out>, buf=<optimized out>, count=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:1099 #11 0x0000000001000b4f in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a120) at /home/user/genode/contrib/stdcxx-79ad8821269a443aab21bfc1d0e9b8913332b6d1/include/stdcxx/bits/basic_string.h:2313 #12 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5d7c8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #13 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #14 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 7 (LWP 76748): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x406fecd0, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x406fecd0) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x00000000500753f0 in Genode::Lock::lock (this=<optimized out>) at /home/user/genode/repos/base/include/base/lock.h:33 #6 Genode::Blockade::block (this=<optimized out>) at /home/user/genode/repos/base/include/base/blockade.h:31 #7 Genode::sleep_forever () at /home/user/genode/repos/base/src/lib/base/sleep.cc:21 #8 0x000000000af66f72 in pthread_exit (value_ptr=value_ptr@entry=0x0) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:558 #9 0x00000000010007d7 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a0d0) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:91 #10 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5d5c8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #11 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #12 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 6 (LWP 76742): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x405fecd0, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x405fecd0) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x00000000500753f0 in Genode::Lock::lock (this=<optimized out>) at /home/user/genode/repos/base/include/base/lock.h:33 #6 Genode::Blockade::block (this=<optimized out>) at /home/user/genode/repos/base/include/base/blockade.h:31 #7 Genode::sleep_forever () at /home/user/genode/repos/base/src/lib/base/sleep.cc:21 #8 0x000000000af66f72 in pthread_exit (value_ptr=value_ptr@entry=0x0) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:558 #9 0x00000000010007d7 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a090) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:91 #10 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5d3b8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #11 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #12 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 5 (LWP 76736): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x418fecd0, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x418fecd0) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x00000000500753f0 in Genode::Lock::lock (this=<optimized out>) at /home/user/genode/repos/base/include/base/lock.h:33 #6 Genode::Blockade::block (this=<optimized out>) at /home/user/genode/repos/base/include/base/blockade.h:31 #7 Genode::sleep_forever () at /home/user/genode/repos/base/src/lib/base/sleep.cc:21 #8 0x000000000af66f72 in pthread_exit (value_ptr=value_ptr@entry=0x0) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:558 #9 0x00000000010007d7 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a5e0) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:91 #10 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5ea18) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #11 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #12 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 4 (LWP 76239): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x404ff1f8, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x404ff1f8) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x000000000af520d6 in Genode::Lock::lock (this=this@entry=0x404ff1f8) at /home/user/genode/repos/base/include/base/lock.h:33 #6 0x000000000af60087 in Libc::Pthread_pool::suspend_myself (timeout_ms=<optimized out>, check=..., this=0xaff8878 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+16112>) at /home/user/genode/repos/libports/src/lib/libc/internal/pthread_pool.h:89 #7 Libc::Kernel::suspend (this=0xaff4988 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space>, check=..., timeout_ms=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/internal/kernel.h:506 #8 0x000000000af8477e in Libc::Vfs_plugin::read (this=0xaff6dc0 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+9272>, fd=<optimized out>, buf=0x404ff640, count=511) at /home/user/genode/repos/libports/src/lib/libc/vfs_plugin.cc:885 #9 0x000000000af7a781 in Libc::Socket_fs::Context::_read_socket_path (this=0x7ffff7f567e0) at /home/user/genode/repos/os/include/os/path.h:239 #10 Libc::Socket_fs::Context::Context (handle_fd=27, proto=Libc::Socket_fs::Context::TCP, this=0x7ffff7f567e0) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:190 #11 socket_fs_accept (libc_fd=libc_fd@entry=4, addr=addr@entry=0x404ffbd0, addrlen=addrlen@entry=0x404ff95c) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:563 #12 0x000000000af7d5d6 in __sys_accept (libc_fd=libc_fd@entry=4, addr=addr@entry=0x404ffbd0, addrlen=addrlen@entry=0x404ff95c) at /home/user/genode/repos/libports/src/lib/libc/socket_operations.cc:80 #13 0x00000000010083ee in Echo_server::Server_main::start_server (envp=<optimized out>) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:135 #14 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f7d068) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #15 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #16 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 3 (LWP 76231): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004dee5 in lx_recvmsg (flags=0, msg=0x402ff9b0, sockfd=...) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:208 #2 Genode::ipc_call (dst=..., snd_msgbuf=..., rcv_msgbuf=...) at /home/user/genode/repos/base-linux/src/lib/base/ipc.cc:343 #3 0x0000000050091a47 in Genode::CapabilityGenode::Signal_source::_callGenode::Signal_source::Rpc_wait_for_signal (this=this@entry=0x501ba830 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space+208>, args=...) at /home/user/genode/repos/base/include/base/native_capability.h:62 #4 0x0000000050091dad in Genode::CapabilityGenode::Signal_source::callGenode::Signal_source::Rpc_wait_for_signal (this=0x501ba830 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space+208>) at /home/user/genode/repos/base/include/base/capability.h:163 #5 Genode::Rpc_clientGenode::Signal_source::callGenode::Signal_source::Rpc_wait_for_signal (this=0x501ba828 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space+200>) at /home/user/genode/repos/base/include/base/rpc_client.h:192 #6 Genode::Signal_source_client::wait_for_signal (this=0x501ba828 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space+200>) at /home/user/genode/repos/base/src/include/signal_source/client.h:28 #7 Genode::Signal_receiver::dispatch_signals (signal_source=0x501ba828 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space+200>) at /home/user/genode/repos/base/src/lib/base/signal.cc:304 #8 0x00000000500921c0 in Signal_handler_thread::entry (this=0x501ba760 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space>) at /home/user/genode/repos/base/include/util/reconstructible.h:151 #9 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #10 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 2 (LWP 76227): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004fee4 in lx_epoll_wait (timeout=-1, maxevents=1, events=0x401ff8a4, epoll=...) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:262 #2 Genode::Native_thread::Epoll::poll (this=this@entry=0x401fffe8) at /home/user/genode/repos/base-linux/src/lib/base/native_thread.cc:80 #3 0x000000005004e5b9 in Genode::ipc_reply_wait (last_caller=..., exc=..., exc@entry=..., reply_msg=..., request_msg=...) at /home/user/genode/repos/base-linux/src/lib/base/ipc.cc:396 #4 0x0000000050067714 in Genode::Rpc_entrypoint::entry (this=0x500fcea8 Genode::bootstrap_component()::startup+360) at /home/user/genode/repos/base/src/lib/base/rpc_dispatch_loop.cc:66 #5 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #6 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 1 (LWP 76221): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x400ffad0, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x400ffad0) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x0000000050090c4a in Genode::Lock::lock (this=0x400ffad0) at /home/user/genode/repos/base/include/base/lock.h:33 #6 Genode::Blockade::block (this=0x400ffad0) at /home/user/genode/repos/base/include/base/blockade.h:31 #7 Genode::Semaphore::down (this=0x500fdaf0 Genode::bootstrap_component()::startup+3504, this@entry=0x400ffad0) at /home/user/genode/repos/base/include/base/semaphore.h:101 #8 Genode::Signal_receiver::block_for_signal (this=this@entry=0x500fdaf0 Genode::bootstrap_component()::startup+3504) at /home/user/genode/repos/base/src/lib/base/signal.cc:235 #9 0x00000000500480ca in Genode::Entrypoint::_process_incoming_signals (this=0x500fcea0 Genode::bootstrap_component()::startup+352) at /home/user/genode/repos/base/include/util/reconstructible.h:59 #10 0x0000000050048fcd in Genode::Entrypoint::Entrypoint (this=0x500fcea0 Genode::bootstrap_component()::startup+352, env=...) at /home/user/genode/repos/base/src/lib/base/entrypoint.cc:360 #11 0x000000005004061d in Genode::Startup::Startup (this=0x500fcd40 Genode::bootstrap_component()::startup) at /home/user/genode/repos/base/src/lib/base/component.cc:74 #12 Genode::bootstrap_component () at /home/user/genode/repos/base/src/lib/base/component.cc:254 #13 0x00000000500ab396 in _main () at /home/user/genode/repos/base/src/lib/startup/_main.cc:255 #14 0x0000000000000000 in ?? ()
```
Continuing and stopping the process over and over again results in the same backtraces of the threads. Looking at the backtraces, nothing seems unusual. A few are stopping, a few are reading and one is writing. None of the `Genode::Lock`s are pointing to the same address. I'm lost. Is there anything I'm missing? I start to believe `lwip` is the root cause here...
[1] https://github.com/sidhussmann/genode/tree/lwip-pthread-deadlock [2] https://github.com/sidhussmann/grpc-connection-stress
Cheers, Sid