Dear Genodians,
during stress testing a gRPC server with parallel connections I witness a race condition where the gRPC server component becomes unresponsive, while being able to respond to Genode heartbeats. Examining the different threads using GDB after the deadlock occurs, I notice one thread waiting at `spinlock_lock` after `Lwip::File_system::write()` threw an `Insufficient_buffer` exception:
```gdb
(gdb) t 13 [Switching to thread 13 (LWP 520409)] #0 spinlock_lock (lock_variable=lock_variable@entry=0x501cad40 Linker::mutex()::_mutex) at /home/user/genode/repos/base/include/spec/x86/cpu/atomic.h:51 51 return tmp == cmp_val; (gdb) bt #0 spinlock_lock (lock_variable=lock_variable@entry=0x501cad40 Linker::mutex()::_mutex) at /home/user/genode/repos/base/include/spec/x86/cpu/atomic.h:51 #1 Genode::Cancelable_lock::lock (this=this@entry=0x501cad40 Linker::mutex()::_mutex, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:71 #2 0x000000005004fa02 in Genode::Mutex::acquire (this=this@entry=0x501cad40 Linker::mutex()::_mutex) at /home/user/genode/repos/base/src/lib/base/mutex.cc:27 #3 0x00000000500bb050 in Genode::Mutex::Guard::Guard (mutex=..., this=<synthetic pointer>) at /home/user/genode/repos/base/include/base/mutex.h:40 #4 dl_iterate_phdr (callback=callback@entry=0x500a3a40 <_Unwind_IteratePhdrCallback>, data=data@entry=0x41ffa1b0) at /home/user/genode/repos/base/src/lib/ldso/exception.cc:41 #5 0x00000000500a4caf in _Unwind_Find_FDE (pc= 0xaf828f2 <Libc::Vfs_plugin::write(Libc::File_descriptor*, void const*, unsigned long)+722>, bases=bases@entry=0x41ffa418) at /data/genode/contrib/gcc-893d85767720fc2b3c695c56d737d2307b1f2229/src/noux-pkg/gcc/libgcc/unwind-dw2-fde-dip.c:469 #6 0x00000000500a12a3 in uw_frame_state_for (context=context@entry=0x41ffa370, fs=fs@entry=0x41ffa460) at /data/genode/contrib/gcc-893d85767720fc2b3c695c56d737d2307b1f2229/src/noux-pkg/gcc/libgcc/unwind-dw2.c:1257 #7 0x00000000500a2bdb in _Unwind_RaiseException (exc=0x501be8e8 Genode::init_cxx_heap(Genode::Env&)::initial_block+7528) at /data/genode/contrib/gcc-893d85767720fc2b3c695c56d737d2307b1f2229/src/noux-pkg/gcc/libgcc/unwind.inc:104 #8 0x00000000500a98c5 in __cxa_throw () at /data/genode/contrib/gcc-893d85767720fc2b3c695c56d737d2307b1f2229/src/noux-pkg/gcc/libgcc/unwind-dw2-fde.h:157 #9 0x000000000ad17404 in non-virtual thunk to Lwip::File_system::write(Vfs::Vfs_handle*, char const*, unsigned long long, unsigned long long&) () from vfs_lwip.lib.so #10 0x000000000af828f3 in Libc::Vfs_plugin::write ( this=0xaff6dc0 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+9272>, fd=0x7ffff7c586b0, buf=0x7ffff7abfd6e, count=3419) at /home/user/genode/repos/libports/src/lib/libc/vfs_plugin.cc:806 #11 0x000000000af711e7 in do_sendto (fd=<optimized out>, buf=0x7ffff7abfd6e, len=3419, dest_addr=<optimized out>, dest_addrlen=<optimized out>, flags=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:176 #12 0x000000000af72a28 in Libc::Socket_fs::Plugin::write (this=<optimized out>, fd=<optimized out>, buf=<optimized out>, count=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:1099 #13 0x00000000011c9f60 in tcp_send (fd=84, msg=msg@entry=0x41ffb2f0, additional_flags=additional_flags@entry=0) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/core/lib/iomgr/tcp_posix.cc:958 #14 0x00000000011ca18a in tcp_flush (tcp=tcp@entry=0x7ffff7d7da80, error=error@entry=0x41fff238) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/core/lib/iomgr/tcp_posix.cc:1499 #15 0x00000000011ca56b in tcp_write (ep=0x7ffff7d7da80, buf=<optimized out>, cb=<optimized out>, arg=<optimized out>) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/core/lib/iomgr/tcp_posix.cc:1636 #16 0x000000000116c6d0 in write_action (gt=0x7ffff7e90e20) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/core/lib/iomgr/closure.h:110 #17 write_action_begin_locked (gt=0x7ffff7e90e20) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc:956 #18 0x00000000011b25c1 in grpc_combiner_continue_exec_ctx () at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/core/lib/iomgr/combiner.cc:252 #19 0x00000000011b9818 in grpc_core::ExecCtx::Flush (this=this@entry=0x41fff3b0) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/core/lib/iomgr/exec_ctx.cc:156 #20 0x0000000001217b64 in grpc_core::ExecCtx::~ExecCtx (this=0x41fff3b0, __in_chrg=<optimized out>) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/core/lib/iomgr/exec_ctx.h:124 #21 grpc_call_start_batch (call=<optimized out>, ops=0x41fff470, nops=3, tag=0x41fff8b0, reserved=<optimized out>) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/core/lib/surface/call.cc:1973 #22 0x0000000001006eaa in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpServerSendStatus, grpc::internal::CallNoOp<4>, grpc::internal::CallNoOp<5>, grpc::internal::CallNoOp<6> >::ContinueFillOpsAfterInterception (this=0x41fff8b0) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/include/grpcpp/impl/codegen/call.h:72 #23 0x00000000010067ec in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpServerSendStatus, grpc::internal::CallNoOp<4>, grpc::internal::CallNoOp<5>, grpc::internal::CallNoOp<6> >::FillOps (this=0x41fff8b0, call=<optimized out>) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/include/grpcpp/impl/codegen/interceptor_common.h:235 #24 0x0000000001008aa4 in grpc::internal::Call::PerformOps (ops=0x41fff8b0, this=<optimized out>) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/include/grpcpp/impl/codegen/call.h:69 #25 grpc_impl::internal::RpcMethodHandler<helloworld::Greeter::Service, helloworld::HelloRequest, helloworld::HelloReply>::RunHandler (this=this@entry=0x7ffff7f55300, param=...) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/include/grpcpp/impl/codegen/method_handler_impl.h:90 #26 0x000000000126891a in grpc_impl::Server::SyncRequest::CallData::ContinueRunAfterInterception (this=this@entry=0x7ffff7cdb650) at /home/user/genode/contrib/stdcxx-79ad8821269a443aab21bfc1d0e9b8913332b6d1/include/stdcxx/bits/move.h:182 #27 0x000000000126916c in grpc_impl::Server::SyncRequest::CallData::Run (resources=true, global_callbacks=..., this=0x7ffff7cdb650) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/cpp/server/server_cc.cc:489 #28 grpc_impl::Server::SyncRequestThreadManager::DoWork (this=<optimized out>, tag=<optimized out>, ok=<optimized out>, resources=<optimized out>) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/cpp/server/server_cc.cc:900 #29 0x00000000012afa66 in grpc::ThreadManager::MainWorkLoop (this=0x7ffff7f4f460) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/cpp/thread_manager/thread_manager.cc:211 #30 0x00000000012afcb0 in grpc::ThreadManager::WorkerThread::Run (this=0x7ffff7eae6e0) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/cpp/thread_manager/thread_manager.cc:43 #31 0x00000000011ae0ac in grpc_core::(anonymous namespace)::ThreadInternalsPosix::<lambda(void*)>::operator() (__closure=0x0, v=<optimized out>) at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/core/lib/gprpp/thd_posix.cc:140 #32 grpc_core::(anonymous namespace)::ThreadInternalsPosix::<lambda(void*)>::_FUN(void *) () at /home/user/genode/contrib/protobuf_grpc-3414c16ccc9b43fa990653590ac986adf8780fce/src/lib/grpc/src/core/lib/gprpp/thd_posix.cc:145 #33 0x000000000af66391 in Libc::Pthread::Thread_object::entry (this=0x7ffff7d374f8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #34 0x0000000050096295 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #35 0x00000000500abab2 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62 (gdb) ```
At this point, I also notice many threads being stuck after a `pthread_exit` call:
```gdb (gdb) t 10 [Switching to thread 10 (LWP 520056)] #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 29 ret /* Return to caller. */ (gdb) bt #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004f3fd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x426ffd60, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f5a3 in Genode::Cancelable_lock::lock (this=this@entry=0x426ffd60) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x0000000050075990 in Genode::Lock::lock (this=<optimized out>) at /home/user/genode/repos/base/include/base/lock.h:33 #6 Genode::Blockade::block (this=<optimized out>) at /home/user/genode/repos/base/include/base/blockade.h:31 #7 Genode::sleep_forever () at /home/user/genode/repos/base/src/lib/base/sleep.cc:21 #8 0x000000000af66332 in pthread_exit (value_ptr=0x0) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:558 #9 0x000000000af663a3 in Libc::Pthread::Thread_object::entry (this=0x7ffff7d7a078) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #10 0x0000000050096295 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #11 0x00000000500abab2 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62 (gdb)
```
Despite our best efforts we have failed to pinpoint the problem so far. A few questions:
1. Is the spinlock really a problem here? 2. I was able to reproduce this problem using the newest version of gRPC [1] and the current `genodelabs/staging` branch. Reading [0] makes me wonder if the problem can be seen as "yet another instance where the VFS synchronization has bitten us". 3. Unlike `lwip`, `lxip` seems to give up after just one or two connections with this warning: `_wait_and_dispatch_one_io_signal called from non-entrypoint thread "pthread.5"`. May this issue be related or should we just focus on `lwip`? 4. In the GDB backtrace e.g. of `thread 13` at `#5` I notice a path to `libgcc` that does not belong to my machine. Is this neglegtable or does it mess with my GDB output?
We just updated the `grpc` and `protobuf` libraries [1], which we will publish shortly. The problem however persists with both versions. To reproduce this issue, please checkout branch [2] and run `grpc-server-lwip` under Linux. Starting multiple client connections in parallel can be done by running `./stress_parallel_execute.sh 10.0.2.55:8899` using [3]. Note that the problem sometimes takes a few thousand of parallel connections to reproduce.
[0] https://github.com/genodelabs/genode/issues/3738#issuecomment-620496337 [1] https://github.com/trimpim/genode-world/tree/xxxx-update_protobuf_grpc_port [2] https://github.com/sidhussmann/genode-world/tree/grpc-server-race-condition [3] https://github.com/sidhussmann/grpc-connection-stress
Cheers, Sid
Hello Sid,
that's an impressive backtrace ;-) which raises the following questions.
1) Does the thread 13 really spin in the spinlock or does it reenter the spinlcok aquisition quite rapidly? 2) Could this be data curruption due to invalid pointers or life-time issues? 3) Does the C++ runtime support catching exceptions in Genode::Mutex while unwinding the occurence of another exception (_Unwind_Find_FDE)?
Regarding your questions...
On Wed, May 06, 2020 at 16:14:22 CEST, Sid Hussmann wrote:
At this point, I also notice many threads being stuck after a `pthread_exit` call:
This is expected as the exiting threads block until they are destroyed in pthread_join() by another thread. A thread cannot destroy itself on most kernels supported.
Despite our best efforts we have failed to pinpoint the problem so far. A few questions:
- Is the spinlock really a problem here?
The spinlock implementation should not be the problem, the storage location of the mutex might be.
- I was able to reproduce this problem using the newest version of
gRPC [1] and the current `genodelabs/staging` branch. Reading [0] makes me wonder if the problem can be seen as "yet another instance where the VFS synchronization has bitten us".
This might be and we are working towards a single-threaded VFS usage in libc. Complex, multi-threaded scenarios like yours with third-party monster libraries like gRPC are quite a challenge we're ready to accept after this work is done.
- Unlike `lwip`, `lxip` seems to give up after just one or two
connections with this warning: `_wait_and_dispatch_one_io_signal called from non-entrypoint thread "pthread.5"`. May this issue be related or should we just focus on `lwip`?
I'm afraid this stems from [1] and hints the execution-model assumptions of the current lxip port are violated. I have no quick fix for this.
[1] https://github.com/genodelabs/genode/blob/dd899fde29448e16c96b2860c391ddcbf2...
- In the GDB backtrace e.g. of `thread 13` at `#5` I notice a path
to `libgcc` that does not belong to my machine. Is this neglegtable or does it mess with my GDB output?
Please see my question 3) above.
We just updated the `grpc` and `protobuf` libraries [1], which we will publish shortly. The problem however persists with both versions. To reproduce this issue, please checkout branch [2] and run `grpc-server-lwip` under Linux. Starting multiple client connections in parallel can be done by running `./stress_parallel_execute.sh 10.0.2.55:8899` using [3]. Note that the problem sometimes takes a few thousand of parallel connections to reproduce.
I must admit that I currently lack capacities to look deeper into this on the practical side.
Greets
Hi Christian,
thank you for your input and clarifications. Let me sum up my experience of the debugging hell of the last few days.
Regarding your questions:
- Does the thread 13 really spin in the spinlock or does it reenter the spinlcok aquisition quite rapidly?
You were right to question about whether thread 13 spins around the spinlock. It looks like I took a misleading snapshot.
- Could this be data curruption due to invalid pointers or life-time issues?
I doubt that this is due to a gRPC bug. I created a simplified scenario [1] without gRPC where I could reproduce the/a deadlock (more information below).
- Does the C++ runtime support catching exceptions in Genode::Mutex while unwinding the occurence of another exception (_Unwind_Find_FDE)?
That's a good question. I tried to verify this by starting several pthreads that try to acquire the same mutex with following code snipped.
```c++ try { log("waiting for mutex"); Mutex::Guard guard(mutex()); log("mutex aquired. sleeping."); usleep(2000*1000); log("throw exception"); throw -1; } catch (...) { log("caught exception"); } ```
This does not yield any deadlock. After examining the problem for the last days, I'm not so sure anymore if throwing of an exception has any influence on the deadlock at all.
It is quite a challenge to really find out what the root problem of this is. Surely, it does not help if there is a huge library like gRPC involved. To eliminate any influence of gRPC, I created a scenario [1] that tries to boil down the problem using a combination of `pthread`, `lwip` and dynamic heap allocation with `std::string`. The scenario has one `_server_thread` that listens for new tcp connections and creates a connection thread for each accepted connection. The connection thread then allocates a `std::string` with 50k data, appends whatever was received and is then sent back. After 10 incoming sessions the threads that are done are cleaned up with `pthread_join()` from within the `_server_thread`. To stress this server, I use the same gRPC stress client [2] described in my last mail, since the data transferred shouldn't be relevant anyways.
After less than 100 connections this will result in a deadlock, where all the threads seem to be at a consistant state:
```gdb
(gdb) info threads Id Target Id Frame * 1 LWP 76221 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 2 LWP 76227 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 3 LWP 76231 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 4 LWP 76239 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 5 LWP 76736 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 6 LWP 76742 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 7 LWP 76748 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 8 LWP 76749 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 9 LWP 76755 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 10 LWP 76761 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 11 LWP 76762 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 12 LWP 76763 "ld.lib.so" pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 (gdb) thread apply all bt
Thread 12 (LWP 76763): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x40bfeb08, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x40bfeb08) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x000000000af520d6 in Genode::Lock::lock (this=this@entry=0x40bfeb08) at /home/user/genode/repos/base/include/base/lock.h:33 #6 0x000000000af60087 in Libc::Pthread_pool::suspend_myself (timeout_ms=<optimized out>, check=..., this=0xaff8878 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+16112>) at /home/user/genode/repos/libports/src/lib/libc/internal/pthread_pool.h:89 #7 Libc::Kernel::suspend (this=0xaff4988 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space>, check=..., timeout_ms=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/internal/kernel.h:506 #8 0x000000000af8477e in Libc::Vfs_plugin::read (this=0xaff6dc0 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+9272>, fd=<optimized out>, buf=0x40bfedd0, count=4096) at /home/user/genode/repos/libports/src/lib/libc/vfs_plugin.cc:885 #9 0x000000000af74178 in do_recvfrom (fd=<optimized out>, buf=0x40bfedd0, len=4096, src_addr=<optimized out>, src_addrlen=0x0, flags=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:756 #10 0x000000000af7424b in Libc::Socket_fs::Plugin::read (this=<optimized out>, fd=<optimized out>, buf=<optimized out>, count=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:1086 #11 0x0000000001000709 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a240) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:65 #12 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5dfe8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #13 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #14 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 11 (LWP 76762): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x40afeb08, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x40afeb08) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x000000000af520d6 in Genode::Lock::lock (this=this@entry=0x40afeb08) at /home/user/genode/repos/base/include/base/lock.h:33 #6 0x000000000af60087 in Libc::Pthread_pool::suspend_myself (timeout_ms=<optimized out>, check=..., this=0xaff8878 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+16112>) at /home/user/genode/repos/libports/src/lib/libc/internal/pthread_pool.h:89 #7 Libc::Kernel::suspend (this=0xaff4988 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space>, check=..., timeout_ms=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/internal/kernel.h:506 #8 0x000000000af8477e in Libc::Vfs_plugin::read (this=0xaff6dc0 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+9272>, fd=<optimized out>, buf=0x40afedd0, count=4096) at /home/user/genode/repos/libports/src/lib/libc/vfs_plugin.cc:885 #9 0x000000000af74178 in do_recvfrom (fd=<optimized out>, buf=0x40afedd0, len=4096, src_addr=<optimized out>, src_addrlen=0x0, flags=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:756 #10 0x000000000af7424b in Libc::Socket_fs::Plugin::read (this=<optimized out>, fd=<optimized out>, buf=<optimized out>, count=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:1086 #11 0x0000000001000709 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a1f0) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:65 #12 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5dde8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #13 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #14 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 10 (LWP 76761): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x409feb08, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x409feb08) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x000000000af520d6 in Genode::Lock::lock (this=this@entry=0x409feb08) at /home/user/genode/repos/base/include/base/lock.h:33 #6 0x000000000af60087 in Libc::Pthread_pool::suspend_myself (timeout_ms=<optimized out>, check=..., this=0xaff8878 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+16112>) at /home/user/genode/repos/libports/src/lib/libc/internal/pthread_pool.h:89 #7 Libc::Kernel::suspend (this=0xaff4988 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space>, check=..., timeout_ms=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/internal/kernel.h:506 #8 0x000000000af8477e in Libc::Vfs_plugin::read (this=0xaff6dc0 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+9272>, fd=<optimized out>, buf=0x409fedd0, count=4096) at /home/user/genode/repos/libports/src/lib/libc/vfs_plugin.cc:885 #9 0x000000000af74178 in do_recvfrom (fd=<optimized out>, buf=0x409fedd0, len=4096, src_addr=<optimized out>, src_addrlen=0x0, flags=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:756 #10 0x000000000af7424b in Libc::Socket_fs::Plugin::read (this=<optimized out>, fd=<optimized out>, buf=<optimized out>, count=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:1086 #11 0x0000000001000709 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a1b0) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:65 #12 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5dbd8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #13 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #14 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 9 (LWP 76755): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x408fecd0, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x408fecd0) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x00000000500753f0 in Genode::Lock::lock (this=<optimized out>) at /home/user/genode/repos/base/include/base/lock.h:33 #6 Genode::Blockade::block (this=<optimized out>) at /home/user/genode/repos/base/include/base/blockade.h:31 #7 Genode::sleep_forever () at /home/user/genode/repos/base/src/lib/base/sleep.cc:21 #8 0x000000000af66f72 in pthread_exit (value_ptr=value_ptr@entry=0x0) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:558 #9 0x00000000010007d7 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a160) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:91 #10 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5d9d8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #11 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #12 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 8 (LWP 76749): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x407fe0a8, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x407fe0a8) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x000000000af520d6 in Genode::Lock::lock (this=this@entry=0x407fe0a8) at /home/user/genode/repos/base/include/base/lock.h:33 #6 0x000000000af60087 in Libc::Pthread_pool::suspend_myself (timeout_ms=<optimized out>, check=..., this=0xaff8878 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+16112>) at /home/user/genode/repos/libports/src/lib/libc/internal/pthread_pool.h:89 #7 Libc::Kernel::suspend (this=0xaff4988 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space>, check=..., timeout_ms=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/internal/kernel.h:506 #8 0x000000000af83346 in Libc::Vfs_plugin::write (this=0xaff6dc0 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+9272>, fd=0x7ffff7feaa70, buf=<optimized out>, count=50099) at /home/user/genode/repos/libports/src/lib/libc/vfs_plugin.cc:785 #9 0x000000000af71e27 in do_sendto (fd=<optimized out>, buf=0x7ffff7f2c010, len=50099, dest_addr=<optimized out>, dest_addrlen=<optimized out>, flags=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:176 #10 0x000000000af73668 in Libc::Socket_fs::Plugin::write (this=<optimized out>, fd=<optimized out>, buf=<optimized out>, count=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:1099 #11 0x0000000001000b4f in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a120) at /home/user/genode/contrib/stdcxx-79ad8821269a443aab21bfc1d0e9b8913332b6d1/include/stdcxx/bits/basic_string.h:2313 #12 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5d7c8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #13 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #14 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 7 (LWP 76748): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x406fecd0, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x406fecd0) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x00000000500753f0 in Genode::Lock::lock (this=<optimized out>) at /home/user/genode/repos/base/include/base/lock.h:33 #6 Genode::Blockade::block (this=<optimized out>) at /home/user/genode/repos/base/include/base/blockade.h:31 #7 Genode::sleep_forever () at /home/user/genode/repos/base/src/lib/base/sleep.cc:21 #8 0x000000000af66f72 in pthread_exit (value_ptr=value_ptr@entry=0x0) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:558 #9 0x00000000010007d7 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a0d0) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:91 #10 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5d5c8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #11 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #12 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 6 (LWP 76742): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x405fecd0, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x405fecd0) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x00000000500753f0 in Genode::Lock::lock (this=<optimized out>) at /home/user/genode/repos/base/include/base/lock.h:33 #6 Genode::Blockade::block (this=<optimized out>) at /home/user/genode/repos/base/include/base/blockade.h:31 #7 Genode::sleep_forever () at /home/user/genode/repos/base/src/lib/base/sleep.cc:21 #8 0x000000000af66f72 in pthread_exit (value_ptr=value_ptr@entry=0x0) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:558 #9 0x00000000010007d7 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a090) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:91 #10 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5d3b8) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #11 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #12 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 5 (LWP 76736): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x418fecd0, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x418fecd0) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x00000000500753f0 in Genode::Lock::lock (this=<optimized out>) at /home/user/genode/repos/base/include/base/lock.h:33 #6 Genode::Blockade::block (this=<optimized out>) at /home/user/genode/repos/base/include/base/blockade.h:31 #7 Genode::sleep_forever () at /home/user/genode/repos/base/src/lib/base/sleep.cc:21 #8 0x000000000af66f72 in pthread_exit (value_ptr=value_ptr@entry=0x0) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:558 #9 0x00000000010007d7 in Echo_server::Server_main::connection_handler (arg=0x7ffff7f7a5e0) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:91 #10 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f5ea18) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #11 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #12 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 4 (LWP 76239): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x404ff1f8, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x404ff1f8) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x000000000af520d6 in Genode::Lock::lock (this=this@entry=0x404ff1f8) at /home/user/genode/repos/base/include/base/lock.h:33 #6 0x000000000af60087 in Libc::Pthread_pool::suspend_myself (timeout_ms=<optimized out>, check=..., this=0xaff8878 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+16112>) at /home/user/genode/repos/libports/src/lib/libc/internal/pthread_pool.h:89 #7 Libc::Kernel::suspend (this=0xaff4988 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space>, check=..., timeout_ms=<optimized out>) at /home/user/genode/repos/libports/src/lib/libc/internal/kernel.h:506 #8 0x000000000af8477e in Libc::Vfs_plugin::read (this=0xaff6dc0 <unmanaged_singleton<Libc::Kernel, 8, Genode::Env&, Genode::Allocator&>(Genode::Env&, Genode::Allocator&)::object_space+9272>, fd=<optimized out>, buf=0x404ff640, count=511) at /home/user/genode/repos/libports/src/lib/libc/vfs_plugin.cc:885 #9 0x000000000af7a781 in Libc::Socket_fs::Context::_read_socket_path (this=0x7ffff7f567e0) at /home/user/genode/repos/os/include/os/path.h:239 #10 Libc::Socket_fs::Context::Context (handle_fd=27, proto=Libc::Socket_fs::Context::TCP, this=0x7ffff7f567e0) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:190 #11 socket_fs_accept (libc_fd=libc_fd@entry=4, addr=addr@entry=0x404ffbd0, addrlen=addrlen@entry=0x404ff95c) at /home/user/genode/repos/libports/src/lib/libc/socket_fs_plugin.cc:563 #12 0x000000000af7d5d6 in __sys_accept (libc_fd=libc_fd@entry=4, addr=addr@entry=0x404ffbd0, addrlen=addrlen@entry=0x404ff95c) at /home/user/genode/repos/libports/src/lib/libc/socket_operations.cc:80 #13 0x00000000010083ee in Echo_server::Server_main::start_server (envp=<optimized out>) at /home/user/genode/repos/libports/src/test/tcp_echo_server/main.cc:135 #14 0x000000000af66fd1 in Libc::Pthread::Thread_object::entry (this=0x7ffff7f7d068) at /home/user/genode/repos/libports/src/lib/libc/pthread.cc:71 #15 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #16 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 3 (LWP 76231): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004dee5 in lx_recvmsg (flags=0, msg=0x402ff9b0, sockfd=...) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:208 #2 Genode::ipc_call (dst=..., snd_msgbuf=..., rcv_msgbuf=...) at /home/user/genode/repos/base-linux/src/lib/base/ipc.cc:343 #3 0x0000000050091a47 in Genode::CapabilityGenode::Signal_source::_callGenode::Signal_source::Rpc_wait_for_signal (this=this@entry=0x501ba830 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space+208>, args=...) at /home/user/genode/repos/base/include/base/native_capability.h:62 #4 0x0000000050091dad in Genode::CapabilityGenode::Signal_source::callGenode::Signal_source::Rpc_wait_for_signal (this=0x501ba830 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space+208>) at /home/user/genode/repos/base/include/base/capability.h:163 #5 Genode::Rpc_clientGenode::Signal_source::callGenode::Signal_source::Rpc_wait_for_signal (this=0x501ba828 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space+200>) at /home/user/genode/repos/base/include/base/rpc_client.h:192 #6 Genode::Signal_source_client::wait_for_signal (this=0x501ba828 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space+200>) at /home/user/genode/repos/base/src/include/signal_source/client.h:28 #7 Genode::Signal_receiver::dispatch_signals (signal_source=0x501ba828 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space+200>) at /home/user/genode/repos/base/src/lib/base/signal.cc:304 #8 0x00000000500921c0 in Signal_handler_thread::entry (this=0x501ba760 <unmanaged_singleton<Genode::Constructible<Signal_handler_thread>, 8>()::object_space>) at /home/user/genode/repos/base/include/util/reconstructible.h:151 #9 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #10 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 2 (LWP 76227): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004fee4 in lx_epoll_wait (timeout=-1, maxevents=1, events=0x401ff8a4, epoll=...) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:262 #2 Genode::Native_thread::Epoll::poll (this=this@entry=0x401fffe8) at /home/user/genode/repos/base-linux/src/lib/base/native_thread.cc:80 #3 0x000000005004e5b9 in Genode::ipc_reply_wait (last_caller=..., exc=..., exc@entry=..., reply_msg=..., request_msg=...) at /home/user/genode/repos/base-linux/src/lib/base/ipc.cc:396 #4 0x0000000050067714 in Genode::Rpc_entrypoint::entry (this=0x500fcea8 Genode::bootstrap_component()::startup+360) at /home/user/genode/repos/base/src/lib/base/rpc_dispatch_loop.cc:66 #5 0x0000000050095f35 in Genode::Thread::_thread_start () at /home/user/genode/repos/base/include/base/output.h:54 #6 0x00000000500ab752 in thread_start () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Thread 1 (LWP 76221): #0 pseudo_end () at /home/user/genode/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x000000005004efdd in lx_futex (val=0, op=0, uaddr=<optimized out>) at /home/user/genode/repos/base-linux/src/lib/syscall/linux_syscalls.h:436 #2 thread_stop_myself (myself=<optimized out>) at /home/user/genode/repos/base-linux/src/include/base/internal/lock_helper.h:62 #3 Genode::Cancelable_lock::lock (this=this@entry=0x400ffad0, myself=...) at /home/user/genode/repos/base/src/lib/base/lock.cc:128 #4 0x000000005004f183 in Genode::Cancelable_lock::lock (this=this@entry=0x400ffad0) at /home/user/genode/repos/base/src/lib/base/lock.cc:66 #5 0x0000000050090c4a in Genode::Lock::lock (this=0x400ffad0) at /home/user/genode/repos/base/include/base/lock.h:33 #6 Genode::Blockade::block (this=0x400ffad0) at /home/user/genode/repos/base/include/base/blockade.h:31 #7 Genode::Semaphore::down (this=0x500fdaf0 Genode::bootstrap_component()::startup+3504, this@entry=0x400ffad0) at /home/user/genode/repos/base/include/base/semaphore.h:101 #8 Genode::Signal_receiver::block_for_signal (this=this@entry=0x500fdaf0 Genode::bootstrap_component()::startup+3504) at /home/user/genode/repos/base/src/lib/base/signal.cc:235 #9 0x00000000500480ca in Genode::Entrypoint::_process_incoming_signals (this=0x500fcea0 Genode::bootstrap_component()::startup+352) at /home/user/genode/repos/base/include/util/reconstructible.h:59 #10 0x0000000050048fcd in Genode::Entrypoint::Entrypoint (this=0x500fcea0 Genode::bootstrap_component()::startup+352, env=...) at /home/user/genode/repos/base/src/lib/base/entrypoint.cc:360 #11 0x000000005004061d in Genode::Startup::Startup (this=0x500fcd40 Genode::bootstrap_component()::startup) at /home/user/genode/repos/base/src/lib/base/component.cc:74 #12 Genode::bootstrap_component () at /home/user/genode/repos/base/src/lib/base/component.cc:254 #13 0x00000000500ab396 in _main () at /home/user/genode/repos/base/src/lib/startup/_main.cc:255 #14 0x0000000000000000 in ?? ()
```
Continuing and stopping the process over and over again results in the same backtraces of the threads. Looking at the backtraces, nothing seems unusual. A few are stopping, a few are reading and one is writing. None of the `Genode::Lock`s are pointing to the same address. I'm lost. Is there anything I'm missing? I start to believe `lwip` is the root cause here...
[1] https://github.com/sidhussmann/genode/tree/lwip-pthread-deadlock [2] https://github.com/sidhussmann/grpc-connection-stress
Cheers, Sid
Moin Sid,
On Sat, May 16, 2020 at 10:10:27 CEST, Sid Hussmann wrote:
This does not yield any deadlock. After examining the problem for the last days, I'm not so sure anymore if throwing of an exception has any influence on the deadlock at all.
Thanks for diving into this pit, I was just clutching at this straw but we should put this aside with your current findings.
It is quite a challenge to really find out what the root problem of this is. Surely, it does not help if there is a huge library like gRPC involved. To eliminate any influence of gRPC, I created a scenario [1] that tries to boil down the problem using a combination of `pthread`, `lwip` and dynamic heap allocation with `std::string`. The scenario has one `_server_thread` that listens for new tcp connections and creates a connection thread for each accepted connection. The connection thread then allocates a `std::string` with 50k data, appends whatever was received and is then sent back. After 10 incoming sessions the threads that are done are cleaned up with `pthread_join()` from within the `_server_thread`. To stress this server, I use the same gRPC stress client [2] described in my last mail, since the data transferred shouldn't be relevant anyways.
After less than 100 connections this will result in a deadlock, where all the threads seem to be at a consistant state:
[...]
Continuing and stopping the process over and over again results in the same backtraces of the threads. Looking at the backtraces, nothing seems unusual. A few are stopping, a few are reading and one is writing. None of the `Genode::Lock`s are pointing to the same address. I'm lost. Is there anything I'm missing? I start to believe `lwip` is the root cause here...
I'm not sure how to interpret correctly what you wrote above. Do you think the processing is stuck completely? Did you try to ICMP ping or forced ARP requests with the lwip instance?
From the backtraces I extracted the following picture.
- The initial thread, the entrypoint and the signal-handler thread are in normal state and ready for serving incoming I/O signals to the IP stack or the libc.
- Thread 10, 11, and 12 are suspended in read(), which means there was no data available for reading on the call to LWIP read. This could be expected, but should not hold for long if I understand your stress test correctly.
- Thread 8 suspends in write(), which could only happen on an insufficient-buffer condition. The question is: Does it leave this state if LWIP pbufs are freed for this operation?
To narrowing down the issue further (and I also suspect lwip to be the culprit), you may to add another I/O-signal source to your test, for example a low-periodic wakeup of the entrypoint for a diagnostic log message. If this works (as expected) the entrypoint may itself use some LWIP related function like nonblocking send or recv or even socket creation/destruction.
It's a pity that the most important information is hidden in the LWIP data structures like: Is there pbuf memory pressure? How long are the send queues? Are there incoming packets? Do ARP and ICMP still work?
Greets
Sali Christian,
thank you for your feedback. I focused my debugging more towards lwIP this time.
On 5/18/20 10:20 AM, Christian Helmuth wrote:
- The initial thread, the entrypoint and the signal-handler thread are in normal state and ready for serving incoming I/O signals to the IP stack or the libc.
I verified that the entrypoint thread is still running by printing out log messages every 10 seconds. More on that below.
Thread 10, 11, and 12 are suspended in read(), which means there was no data available for reading on the call to LWIP read. This could be expected, but should not hold for long if I understand your stress test correctly.
Thread 8 suspends in write(), which could only happen on an insufficient-buffer condition. The question is: Does it leave this state if LWIP pbufs are freed for this operation?
To narrowing down the issue further (and I also suspect lwip to be the culprit), you may to add another I/O-signal source to your test, for example a low-periodic wakeup of the entrypoint for a diagnostic log message.
Good point. I played around with different memory settings in `lwipopts.h` such as `MEM_SIZE`, `MEM(P)_OVERFLOW_CHECK`, `MEM(P)_SANITY_CHECK` or `*MBOX_SIZE` defines. Also, I increased the memory for `test-tcp_echo_server` in the run script. This did not have any affect on how many connections can be handled. Not sure if I'm chasing a different problem now than in my first mail, but here memory overflow seems not to be the issue...
If this works (as expected) the entrypoint may itself use some LWIP related function like nonblocking send or recv or even socket creation/destruction.
It's a pity that the most important information is hidden in the LWIP data structures like: Is there pbuf memory pressure? How long are the send queues? Are there incoming packets? Do ARP and ICMP still work?
Also good point. To be able to talk to the actual lwIP stack, I removed the `nic_router` from `echo_server_lwip_tcp.run` in [1]. After the "deadlock" occurs, the lwIP stack does not respond to ARP nor ICMP. However, I noticed in wireshark [0] that lwIP is still trying to re-transmit a packet a couple of times, even though the client has closed the connection with a RST (Port 8899 -> 38868). I then implemented the `Lwip::tcp_poll_callback()` in `lwip/vfs.cc` [2] which gets called to notify the application (in this case vfs) when a connection is idle. After "deadlock", the `tcp_poll_callback` function gets called repeatedly forever. So the lwIP stack does not seem to be completely stuck. However, the RST from the client has not been handled at all.
1. I'm trying to figure out how a pthread maps to data being sent via lwIP. E.g. when calling `write()` from a pthread is it the same thread in which lwIP is sending the data? If not, is it safe to ignore the hints from [4]?
2. In `Lwip::tcp_sent_callback()` [2] we ignore the length argument completely. Is this intended and handled with the `process_io` functionality?
3. I didn't find an easy way to find out more about queues of lwIP. With my current findings, does it still make sense to iterate through all the linked lists of lwIP?
4. Since lwIP still handles callbacks and re-transmitts packets after the deadlock, the problem seems to be `vfs` related. Would you agree?
[0] https://cloud.gapfruit.com/s/BeGEZjw9qFMizbd [1] https://github.com/sidhussmann/genode/tree/lwip-pthread-deadlock [2] https://github.com/sidhussmann/genode/commit/6c924f5a33739d667a0ec7aa229ccf2... [3] https://github.com/sidhussmann/genode/blob/6c924f5a33739d667a0ec7aa229ccf2c8... [4] http://www.nongnu.org/lwip/2_1_x/multithreading.html
Cheers, Sid
Hello Sid,
I tried to run your stress test on Ubuntu 18.04 but only get the following log. Do you have a hint how to get rid of the exception?
./greeter_client 10.0.2.55:8899
Sending greeter to: 10.0.2.55:8899 Floating point exception (core dumped)
I installed the Ubuntu protobuf/grpc programming environment.
apt policy protobuf* grpc*
protobuf-compiler: Installed: 3.0.0-9.1ubuntu1 Candidate: 3.0.0-9.1ubuntu1 Version table: *** 3.0.0-9.1ubuntu1 500 500 http://de.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages 100 /var/lib/dpkg/status protobuf-compiler-grpc: Installed: 1.3.2-1.1~build1 Candidate: 1.3.2-1.1~build1 Version table: *** 1.3.2-1.1~build1 500 500 http://de.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages 100 /var/lib/dpkg/status libgrpc++1: Installed: 1.3.2-1.1~build1 Candidate: 1.3.2-1.1~build1 Version table: *** 1.3.2-1.1~build1 500 500 http://de.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages 100 /var/lib/dpkg/status libgrpc-dev: Installed: 1.3.2-1.1~build1 Candidate: 1.3.2-1.1~build1 Version table: *** 1.3.2-1.1~build1 500 500 http://de.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages 100 /var/lib/dpkg/status protobuf-compiler-grpc: Installed: 1.3.2-1.1~build1 Candidate: 1.3.2-1.1~build1 Version table: *** 1.3.2-1.1~build1 500 500 http://de.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages 100 /var/lib/dpkg/status libgrpc++-dev: Installed: 1.3.2-1.1~build1 Candidate: 1.3.2-1.1~build1 Version table: *** 1.3.2-1.1~build1 500 500 http://de.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages 100 /var/lib/dpkg/status libgrpc3: Installed: 1.3.2-1.1~build1 Candidate: 1.3.2-1.1~build1 Version table: *** 1.3.2-1.1~build1 500 500 http://de.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages 100 /var/lib/dpkg/status
I also patched the stress-test tool to compile with this invironment.
git diff
diff --git a/greeter_client.cc b/greeter_client.cc index 970e82e..a5207fb 100644 --- a/greeter_client.cc +++ b/greeter_client.cc @@ -20,7 +20,7 @@ #include <memory> #include <string>
-#include <grpcpp/grpcpp.h> +#include <grpc++/grpc++.h>
#ifdef BAZEL_BUILD #include "examples/protos/helloworld.grpc.pb.h"
Greets
Hello Sid,
as I'm not able to fix the floating-point exception I just used the following script and executed it in parallel with the depicted command line.
netcat.sh:
#!/bin/bash for i in $(seq 10000); do dd if=/dev/zero status=none count=4 | netcat 10.0.2.55 8899 > /dev/null done
parallel -j 20 ./netcat.sh -- $(seq 20)
I also removed the diagnostic messages and after 422 rounds I got the following log messages.
[init -> test-tcp_echo_server] waiting for connection 422 [init -> test-tcp_echo_server] Error: tcp_err_callback arg=null [init -> test-tcp_echo_server] Error: Assertion "pbuf_free: p->ref > 0" /plain/krishna/src/genode/genode_tmp.git/contrib/lwip-6e0661b21fde397041389d5d8db906b5a6543700/src/lib/lwip/src/core/pbuf.c:753 [init -> test-tcp_echo_server] Error: tcp_poll_callback [init -> test-tcp_echo_server] Error: tcp_poll_callback socket=653982970 state=4 [init -> test-tcp_echo_server] Error: tcp_poll_callback local_port=8899 remote_port=34522 state=4 [init -> test-tcp_echo_server] Error: tcp_poll_callback
This assertion seems to be very serious and the code decrements the reference counter also if the assertion fails, which should render this pbuf not reclaimable for a long time due to the integer wrap-around. In my test case the lwip stack stills happily responds to ARP and ICMP ping.
Is my test approach reasonable? Should we hunt the issue depicted above?
Regards
Hi Christian,
On 5/22/20 11:08 AM, Christian Helmuth wrote:
Hello Sid,
I tried to run your stress test on Ubuntu 18.04 but only get the following log. Do you have a hint how to get rid of the exception?
Building gRPC binaries with the distro provided libs can be painful. Sometime it works, sometime it doesn't. I built a statically linked binary which you can find here [0].
[0] https://cloud.gapfruit.com/s/YSHxpcCzSXoCnmW
Cheers, Sid
Hi Christian,
On 5/22/20 12:11 PM, Christian Helmuth wrote:
This assertion seems to be very serious and the code decrements the reference counter also if the assertion fails, which should render this pbuf not reclaimable for a long time due to the integer wrap-around. In my test case the lwip stack stills happily responds to ARP and ICMP ping.
Is my test approach reasonable? Should we hunt the issue depicted above?
It is very reasonable indeed, and we should hunt this issue as well. I ran into this assertion at some point as too. However, I wanted to focus first on the not so obvious "deadlock" problem, and takle the latter after. But I'm open to ideas on how to make lwIP as stable as possible. Should we move this discussion to github and create two issues?
Cheers, Sid
Hey Sid,
On Fri, May 22, 2020 at 14:14:37 CEST, Sid Hussmann wrote:
Building gRPC binaries with the distro provided libs can be painful. Sometime it works, sometime it doesn't. I built a statically linked binary which you can find here [0].
Thanks for the binary, but as long as netcat is reasonable I'll stay with it. If the issue can be reproduced completely without gRPC on both ends then we have not to look into its complexity. I'm even thinking about replacing netcat by hping3 if this helps.
Greets
Sid,
On Fri, May 22, 2020 at 14:24:10 CEST, Sid Hussmann wrote:
It is very reasonable indeed, and we should hunt this issue as well. I ran into this assertion at some point as too. However, I wanted to focus first on the not so obvious "deadlock" problem, and takle the latter after. But I'm open to ideas on how to make lwIP as stable as possible.
That's only fair! I'd also love to stabilize one of the network stacks more (whether it be lwip or lxip).
Should we move this discussion to github and create two issues?
I'd much appreciate one new issue for the lwip-stress stability referring to the original test case (gRPC) and the current much reduced one (your test and netcat).
Regards
Hey Christian,
On 5/22/20 2:46 PM, Christian Helmuth wrote:
Thanks for the binary, but as long as netcat is reasonable I'll stay with it. If the issue can be reproduced completely without gRPC on both ends then we have not to look into its complexity. I'm even thinking about replacing netcat by hping3 if this helps.
I have to admit, I currently mis-use the gRPC client as a fuzzy-test that I can reproduce my "deadlock" problem. I'd very much appreciate a script I can use with standard Linux tools, that reproduces the deadlock. If you have a hping3 script I can use, I'm very happy to switch!
Cheers, Sid