Hi Everyone,
I'm currently using a Nic_connection to exchange data data between components on linux_x86.
Before sending data from the Nic server I first check whether any data can be freed. Even before the first packet is send out, using the code below:
while ( _rx.source()->ack_avail() ) { _rx.source()->release_packet( _rx.source()->get_acked_packet() ); }
Whenever I'm sending data out from an extern "C" method (callback passed to a c library), the application crashes. Looking with GDB, the issues seems to be with the destructor of the lock guard from the code from packet_stream.h (full GDB output at the bottom of this email).
bool ready_for_rx() { Genode::Lock::Guard lock_guard(_rx_queue_lock); return !_rx_queue->empty(); }
The destructor of the Guard simply calls the unlock method for the lock. However this crashes. Could it be that the unlock method throws an exception in the destructor, or that there might be uninitialized variables within the lock itself?
I'm sort of lost here, because I've used the Nic_connection in similar settings in the past (also called from an extern "C" context as a c code callback). However this time it consistently breaks, whenever I try to sent out the first packet from an extern "c" context (it does work when sending the packet out from normal c++ code.
Any suggestions what might causes the crash in my application?
Cheers, Menno
Program received signal SIGSEGV, Segmentation fault. 0x0000000001143c22 in ~Lock_guard (this=<optimized out>, __in_chrg=<optimized out>) at <project-path...>/repos/base/include/base/lock_guard.h:42 42 ~Lock_guard() { _lock.unlock(); } (gdb) bt #0 0x0000000001143c22 in ~Lock_guard (this=<optimized out>, __in_chrg=<optimized out>) at <project-path...>/repos/base/include/base/lock_guard.h:42 #1 ready_for_rx (this=<optimized out>) at <project-path...>/repos/os/include/os/packet_stream.h:400 #2 ack_avail (this=<optimized out>) at <project-path...>/repos/os/include/os/packet_stream.h:686
....
Hi Menno,
On 08/30/2016 03:43 PM, Menno Valkema wrote:
Hi Everyone,
I'm currently using a Nic_connection to exchange data data between components on linux_x86.
Before sending data from the Nic server I first check whether any data can be freed. Even before the first packet is send out, using the code below:
while ( _rx.source()->ack_avail() ) { _rx.source()->release_packet( _rx.source()->get_acked_packet() ); }
Whenever I'm sending data out from an extern "C" method (callback passed to a c library), the application crashes. Looking with GDB, the issues seems to be with the destructor of the lock guard from the code from packet_stream.h (full GDB output at the bottom of this email).
bool ready_for_rx() { Genode::Lock::Guard lock_guard(_rx_queue_lock); return !_rx_queue->empty(); }
The destructor of the Guard simply calls the unlock method for the lock. However this crashes. Could it be that the unlock method throws an exception in the destructor, or that there might be uninitialized variables within the lock itself?
I'm sort of lost here, because I've used the Nic_connection in similar settings in the past (also called from an extern "C" context as a c code callback). However this time it consistently breaks, whenever I try to sent out the first packet from an extern "c" context (it does work when sending the packet out from normal c++ code.
Any suggestions what might causes the crash in my application?
Cheers, Menno
Program received signal SIGSEGV, Segmentation fault. 0x0000000001143c22 in ~Lock_guard (this=<optimized out>, __in_chrg=<optimized out>) at <project-path...>/repos/base/include/base/lock_guard.h:42 42 ~Lock_guard() { _lock.unlock(); } (gdb) bt #0 0x0000000001143c22 in ~Lock_guard (this=<optimized out>, __in_chrg=<optimized out>) at <project-path...>/repos/base/include/base/lock_guard.h:42 #1 ready_for_rx (this=<optimized out>) at <project-path...>/repos/os/include/os/packet_stream.h:400 #2 ack_avail (this=<optimized out>) at <project-path...>/repos/os/include/os/packet_stream.h:686
You could check the address of _lock in the constructor of Lock_guard and also in the destructor. It might be stack corruption. In case the address remains the same, is it the same as the segmentation fault address?
Cheers,
Sebastian
Hi Sebastian,
Thank you for thinking along!
I'm having a hard time getting the address of the lock during construction and deconstruction. I tried 2 methods: 1) Calling PERR in the constructor and destructor, which somehow does not compile. For now I assume this is because the locking code might be too low level to work with an advanced feature as PERR (which probably uses a lock itself somewhere in the calling process). 2) When I try to set a breakpoint after attaching GDB to the process at the problem area near the lock, the execution does not halt at the intended breakpoint, but continues until it segfaults (or it might not even reach that part of the code).
Also I notice that when looking at the stack trace (using 'bt') GDB now seems unable to determine where the segfault happened (no method or file), where before it seemed to come from somewhere around 'ready_for_rx'. I did manually load the symbol tables using 'set solib-search-path bin', so all other entries in the stack trace consist of clear debugging information.
Looking at kern.log I see the following (sp is the same for each segfault for this process): Aug 31 11:20:52 knuth kernel: [181321.037468] ld.lib.so[25211]: segfault at 0 ip (null) sp 00000000404ffd48 error 14 in test_app[1000000+16000]
Looking at the /proc/N/maps (see full output below), sp seems to point so the following entry:
404fc000-40500000 rwxs 00000000 fc:01 6565435 /tmp/genode-1000/ds-114 (deleted)
Looking at the data being sent out over the Nic_connection (this is a local character array consisting of "whoopie", I created for debugging purposes instead of 'real' data), points to an address very near to the wrong sp address.
to_send = {_data = 0x404ffd50 "whoopie", _size = 8}
I'm not fluent enough in the Genode nor Linux loading process to see what might be going on here. However it makes sense to get a segfault for some 'deleted' entry. It does seem that your previous comment on a possible corrupted stack might be right, however there are no obvious stack overflows in the code.
Does anything in this additional information give you an idea what might be wrong here?
Thank you, Menno
01000000-01016000 r-xs 00001000 fc:01 6565216 <project dir>/build/linux_x86/test_app/test_app 01016000-0101b000 rwxs 00000000 fc:01 6565416 /tmp/genode-1000/ds-95 (deleted) 0101b000-010f6000 r-xs 00001000 fc:01 6557519 <project dir>/build/linux_x86/var/libcache/libc/libc.lib.so 010f6000-0112d000 rwxs 00000000 fc:01 6565420 /tmp/genode-1000/ds-99 (deleted) 0112d000-01151000 r-xs 00001000 fc:01 6560429 <project dir>/build/linux_x86/var/libcache/libcsl/libcsl.lib.so 01151000-01156000 rwxs 00000000 fc:01 6565423 /tmp/genode-1000/ds-102 (deleted) 01156000-0118a000 r-xs 00001000 fc:01 6564859 <project dir>/build/linux_x86/var/libcache/lwip/lwip.lib.so 0118a000-01194000 rwxs 00000000 fc:01 6565427 /tmp/genode-1000/ds-106 (deleted) 01194000-0b000000 ---p 00000000 00:00 0 40000000-400e0000 ---p 00000000 00:00 0 400e0000-40100000 rwxs 00000000 fc:01 6565400 /tmp/genode-1000/ds-82 (deleted) 40100000-401e0000 ---p 00000000 00:00 0 401e0000-40200000 rwxs 00000000 fc:01 6565403 /tmp/genode-1000/ds-84 (deleted) 40200000-402f8000 ---p 00000000 00:00 0 402f8000-40300000 rwxs 00000000 fc:01 6565409 /tmp/genode-1000/ds-88 (deleted) 40300000-403f0000 ---p 00000000 00:00 0 403f0000-40400000 rwxs 00000000 fc:01 6565432 /tmp/genode-1000/ds-111 (deleted) 40400000-404fc000 ---p 00000000 00:00 0 404fc000-40500000 rwxs 00000000 fc:01 6565435 /tmp/genode-1000/ds-114 (deleted) 40500000-405ff000 ---p 00000000 00:00 0 405ff000-40600000 rwxs 00000000 fc:01 6565436 /tmp/genode-1000/ds-115 (deleted) 40600000-406fe000 ---p 00000000 00:00 0 406fe000-40700000 rwxs 00000000 fc:01 6565457 /tmp/genode-1000/ds-123 (deleted) 40700000-50000000 ---p 00000000 00:00 0 50000000-50100000 rwxp 00000000 00:00 0 50100000-5017b000 r-xp 00001000 fc:01 6556675 <project dir>/build/linux_x86/var/libcache/ld/ld.lib.so 5017b000-5018f000 rwxp 0007c000 fc:01 6556675 <project dir>/build/linux_x86/var/libcache/ld/ld.lib.so 5018f000-50249000 rwxp 00000000 00:00 0 2b7c3051d000-2b7c30525000 rwxs 00000000 fc:01 6565392 /tmp/genode-1000/ds-75 (deleted) 2b7c30525000-2b7c30527000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c30527000-2b7c30529000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c30529000-2b7c3052b000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c3052b000-2b7c30533000 rwxs 00000000 fc:01 6565413 /tmp/genode-1000/ds-92 (deleted) 2b7c30533000-2b7c30534000 rwxs 00000000 fc:01 6565383 /tmp/genode-1000/ds-66 (deleted) 2b7c30534000-2b7c30544000 rwxs 00000000 fc:01 6565431 /tmp/genode-1000/ds-110 (deleted) 2b7c30544000-2b7c30546000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c30546000-2b7c30548000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c30548000-2b7c30568000 rwxs 00000000 fc:01 6565452 /tmp/genode-1000/ds-119 (deleted) 2b7c30568000-2b7c306f8000 rwxs 00000000 fc:01 6565453 /tmp/genode-1000/ds-120 (deleted) 2b7c306f8000-2b7c30888000 rwxs 00000000 fc:01 6565454 /tmp/genode-1000/ds-121 (deleted) 2b7c30888000-2b7c3088a000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c3088a000-2b7c30a1a000 rwxs 00000000 fc:01 6565460 /tmp/genode-1000/ds-125 (deleted) 2b7c30a1a000-2b7c30baa000 rwxs 00000000 fc:01 6565464 /tmp/genode-1000/ds-126 (deleted) 2b7c30baa000-2b7c30d3a000 rwxs 00000000 fc:01 6565460 /tmp/genode-1000/ds-125 (deleted) 2b7c30d3a000-2b7c30eca000 rwxs 00000000 fc:01 6565464 /tmp/genode-1000/ds-126 (deleted) 7fffd8f3d000-7fffd8f5e000 rwxp 00000000 00:00 0 7fffd8f76000-7fffd8f78000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
On 30-08-16 16:05, Sebastian Sumpf wrote:
Hi Menno,
On 08/30/2016 03:43 PM, Menno Valkema wrote:
Hi Everyone,
I'm currently using a Nic_connection to exchange data data between components on linux_x86.
Before sending data from the Nic server I first check whether any data can be freed. Even before the first packet is send out, using the code below:
while ( _rx.source()->ack_avail() ) { _rx.source()->release_packet( _rx.source()->get_acked_packet() ); }
Whenever I'm sending data out from an extern "C" method (callback passed to a c library), the application crashes. Looking with GDB, the issues seems to be with the destructor of the lock guard from the code from packet_stream.h (full GDB output at the bottom of this email).
bool ready_for_rx() { Genode::Lock::Guard lock_guard(_rx_queue_lock); return !_rx_queue->empty(); }
The destructor of the Guard simply calls the unlock method for the lock. However this crashes. Could it be that the unlock method throws an exception in the destructor, or that there might be uninitialized variables within the lock itself?
I'm sort of lost here, because I've used the Nic_connection in similar settings in the past (also called from an extern "C" context as a c code callback). However this time it consistently breaks, whenever I try to sent out the first packet from an extern "c" context (it does work when sending the packet out from normal c++ code.
Any suggestions what might causes the crash in my application?
Cheers, Menno
Program received signal SIGSEGV, Segmentation fault. 0x0000000001143c22 in ~Lock_guard (this=<optimized out>, __in_chrg=<optimized out>) at <project-path...>/repos/base/include/base/lock_guard.h:42 42 ~Lock_guard() { _lock.unlock(); } (gdb) bt #0 0x0000000001143c22 in ~Lock_guard (this=<optimized out>, __in_chrg=<optimized out>) at <project-path...>/repos/base/include/base/lock_guard.h:42 #1 ready_for_rx (this=<optimized out>) at <project-path...>/repos/os/include/os/packet_stream.h:400 #2 ack_avail (this=<optimized out>) at <project-path...>/repos/os/include/os/packet_stream.h:686
You could check the address of _lock in the constructor of Lock_guard and also in the destructor. It might be stack corruption. In case the address remains the same, is it the same as the segmentation fault address?
Cheers,
Sebastian
genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main
Hi Menno,
On 08/31/2016 02:36 PM, Menno Valkema wrote:
Hi Sebastian,
Thank you for thinking along!
I'm having a hard time getting the address of the lock during construction and deconstruction. I tried 2 methods: 1) Calling PERR in the constructor and destructor, which somehow does not compile. For now I assume this is because the locking code might be too low level to work with an advanced feature as PERR (which probably uses a lock itself somewhere in the calling process). 2) When I try to set a breakpoint after attaching GDB to the process at the problem area near the lock, the execution does not halt at the intended breakpoint, but continues until it segfaults (or it might not even reach that part of the code).
You need this (https://github.com/ssumpf/genode/commit/61177e3be9529bc68e8d6a0af1919bd80733...) commit for breakpoints to work on Linux. When using gdb on Linux, you have to start gdb in the 'bin' directory of your Genode build directory, this is currently the only way gdb can find all required shared libraries.
You also have to tell gdb the process id (actually the thread) you want to debug with the -p option, which you can find out with 'ps -efL | grep Genode'. I wrote a small helper script to simplify things (attached).
The way I use gdb on Linux works like this:
1. Put 'wait_for_continue' somewhere in your code ('main' function or 'construct' function). Simply declare the 'void wait_for_continue(void)' (C) or 'extern "C" void wait_for_continue()' (C++) function and call it.
2. Start your run script (it has to use 'run_genode_until forever'!)
3. Your program will pause in the 'wait_for_continue' function.
4. On a different shell 'cd <build_dir>/bin'
5. gdp syntax is: 'gdp <binary> <thread in binary - starting with zero>'. Usually the thread of interest is the second thread (1).
gdp test-program 1
6. You should see that all required shared libraries are loaded (like ldso.lib.so). Set your break point, watch point, or anything else and then enter 'c' for continue.
7. Go back to the shell where you started the scenario and hit <Enter>, which in turn resumes operation.
8. The system should then stop at the breakpoint, assuming you choose the right thread. In order to find out if the thread is the right one use single stepping ('s' or 'n') instead of 'c' and look if you end up in familiar code.
It also helps to compile your code with -O0, which yields to a more consistent output when single stepping, while sometimes not triggering the bug.
Also, make sure all shared libraries are loaded correctly with 'info shared', if this does not show anything, sometimes the 'share' command helps.
It took us a while to come up with this solution, but it has helped me a lot, I hope this helps you as well,
Sebastian
Also I notice that when looking at the stack trace (using 'bt') GDB now seems unable to determine where the segfault happened (no method or file), where before it seemed to come from somewhere around 'ready_for_rx'. I did manually load the symbol tables using 'set solib-search-path bin', so all other entries in the stack trace consist of clear debugging information.
Looking at kern.log I see the following (sp is the same for each segfault for this process): Aug 31 11:20:52 knuth kernel: [181321.037468] ld.lib.so[25211]: segfault at 0 ip (null) sp 00000000404ffd48 error 14 in test_app[1000000+16000]
Looking at the /proc/N/maps (see full output below), sp seems to point so the following entry:
404fc000-40500000 rwxs 00000000 fc:01 6565435 /tmp/genode-1000/ds-114 (deleted)
Looking at the data being sent out over the Nic_connection (this is a local character array consisting of "whoopie", I created for debugging purposes instead of 'real' data), points to an address very near to the wrong sp address.
to_send = {_data = 0x404ffd50 "whoopie", _size = 8}
I'm not fluent enough in the Genode nor Linux loading process to see what might be going on here. However it makes sense to get a segfault for some 'deleted' entry. It does seem that your previous comment on a possible corrupted stack might be right, however there are no obvious stack overflows in the code.
Does anything in this additional information give you an idea what might be wrong here?
Thank you, Menno
01000000-01016000 r-xs 00001000 fc:01 6565216 <project dir>/build/linux_x86/test_app/test_app 01016000-0101b000 rwxs 00000000 fc:01 6565416 /tmp/genode-1000/ds-95 (deleted) 0101b000-010f6000 r-xs 00001000 fc:01 6557519 <project dir>/build/linux_x86/var/libcache/libc/libc.lib.so 010f6000-0112d000 rwxs 00000000 fc:01 6565420 /tmp/genode-1000/ds-99 (deleted) 0112d000-01151000 r-xs 00001000 fc:01 6560429 <project dir>/build/linux_x86/var/libcache/libcsl/libcsl.lib.so 01151000-01156000 rwxs 00000000 fc:01 6565423 /tmp/genode-1000/ds-102 (deleted) 01156000-0118a000 r-xs 00001000 fc:01 6564859 <project dir>/build/linux_x86/var/libcache/lwip/lwip.lib.so 0118a000-01194000 rwxs 00000000 fc:01 6565427 /tmp/genode-1000/ds-106 (deleted) 01194000-0b000000 ---p 00000000 00:00 0 40000000-400e0000 ---p 00000000 00:00 0 400e0000-40100000 rwxs 00000000 fc:01 6565400 /tmp/genode-1000/ds-82 (deleted) 40100000-401e0000 ---p 00000000 00:00 0 401e0000-40200000 rwxs 00000000 fc:01 6565403 /tmp/genode-1000/ds-84 (deleted) 40200000-402f8000 ---p 00000000 00:00 0 402f8000-40300000 rwxs 00000000 fc:01 6565409 /tmp/genode-1000/ds-88 (deleted) 40300000-403f0000 ---p 00000000 00:00 0 403f0000-40400000 rwxs 00000000 fc:01 6565432 /tmp/genode-1000/ds-111 (deleted) 40400000-404fc000 ---p 00000000 00:00 0 404fc000-40500000 rwxs 00000000 fc:01 6565435 /tmp/genode-1000/ds-114 (deleted) 40500000-405ff000 ---p 00000000 00:00 0 405ff000-40600000 rwxs 00000000 fc:01 6565436 /tmp/genode-1000/ds-115 (deleted) 40600000-406fe000 ---p 00000000 00:00 0 406fe000-40700000 rwxs 00000000 fc:01 6565457 /tmp/genode-1000/ds-123 (deleted) 40700000-50000000 ---p 00000000 00:00 0 50000000-50100000 rwxp 00000000 00:00 0 50100000-5017b000 r-xp 00001000 fc:01 6556675 <project dir>/build/linux_x86/var/libcache/ld/ld.lib.so 5017b000-5018f000 rwxp 0007c000 fc:01 6556675 <project dir>/build/linux_x86/var/libcache/ld/ld.lib.so 5018f000-50249000 rwxp 00000000 00:00 0 2b7c3051d000-2b7c30525000 rwxs 00000000 fc:01 6565392 /tmp/genode-1000/ds-75 (deleted) 2b7c30525000-2b7c30527000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c30527000-2b7c30529000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c30529000-2b7c3052b000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c3052b000-2b7c30533000 rwxs 00000000 fc:01 6565413 /tmp/genode-1000/ds-92 (deleted) 2b7c30533000-2b7c30534000 rwxs 00000000 fc:01 6565383 /tmp/genode-1000/ds-66 (deleted) 2b7c30534000-2b7c30544000 rwxs 00000000 fc:01 6565431 /tmp/genode-1000/ds-110 (deleted) 2b7c30544000-2b7c30546000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c30546000-2b7c30548000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c30548000-2b7c30568000 rwxs 00000000 fc:01 6565452 /tmp/genode-1000/ds-119 (deleted) 2b7c30568000-2b7c306f8000 rwxs 00000000 fc:01 6565453 /tmp/genode-1000/ds-120 (deleted) 2b7c306f8000-2b7c30888000 rwxs 00000000 fc:01 6565454 /tmp/genode-1000/ds-121 (deleted) 2b7c30888000-2b7c3088a000 rwxs 00000000 fc:01 6565379 /tmp/genode-1000/ds-63 (deleted) 2b7c3088a000-2b7c30a1a000 rwxs 00000000 fc:01 6565460 /tmp/genode-1000/ds-125 (deleted) 2b7c30a1a000-2b7c30baa000 rwxs 00000000 fc:01 6565464 /tmp/genode-1000/ds-126 (deleted) 2b7c30baa000-2b7c30d3a000 rwxs 00000000 fc:01 6565460 /tmp/genode-1000/ds-125 (deleted) 2b7c30d3a000-2b7c30eca000 rwxs 00000000 fc:01 6565464 /tmp/genode-1000/ds-126 (deleted) 7fffd8f3d000-7fffd8f5e000 rwxp 00000000 00:00 0 7fffd8f76000-7fffd8f78000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
On 30-08-16 16:05, Sebastian Sumpf wrote:
Hi Menno,
On 08/30/2016 03:43 PM, Menno Valkema wrote:
Hi Everyone,
I'm currently using a Nic_connection to exchange data data between components on linux_x86.
Before sending data from the Nic server I first check whether any data can be freed. Even before the first packet is send out, using the code below:
while ( _rx.source()->ack_avail() ) { _rx.source()->release_packet( _rx.source()->get_acked_packet() ); }
Whenever I'm sending data out from an extern "C" method (callback passed to a c library), the application crashes. Looking with GDB, the issues seems to be with the destructor of the lock guard from the code from packet_stream.h (full GDB output at the bottom of this email).
bool ready_for_rx() { Genode::Lock::Guard lock_guard(_rx_queue_lock); return !_rx_queue->empty(); }
The destructor of the Guard simply calls the unlock method for the lock. However this crashes. Could it be that the unlock method throws an exception in the destructor, or that there might be uninitialized variables within the lock itself?
I'm sort of lost here, because I've used the Nic_connection in similar settings in the past (also called from an extern "C" context as a c code callback). However this time it consistently breaks, whenever I try to sent out the first packet from an extern "c" context (it does work when sending the packet out from normal c++ code.
Any suggestions what might causes the crash in my application?
Cheers, Menno
Program received signal SIGSEGV, Segmentation fault. 0x0000000001143c22 in ~Lock_guard (this=<optimized out>, __in_chrg=<optimized out>) at <project-path...>/repos/base/include/base/lock_guard.h:42 42 ~Lock_guard() { _lock.unlock(); } (gdb) bt #0 0x0000000001143c22 in ~Lock_guard (this=<optimized out>, __in_chrg=<optimized out>) at <project-path...>/repos/base/include/base/lock_guard.h:42 #1 ready_for_rx (this=<optimized out>) at <project-path...>/repos/os/include/os/packet_stream.h:400 #2 ack_avail (this=<optimized out>) at <project-path...>/repos/os/include/os/packet_stream.h:686
You could check the address of _lock in the constructor of Lock_guard and also in the destructor. It might be stack corruption. In case the address remains the same, is it the same as the segmentation fault address?
Cheers,
Sebastian
genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main
genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main