Segfault when calling (Cancelable)Lock.unlock?

Wed Aug 31 18:08:01 CEST 2016

Hi Menno,

On 08/31/2016 02:36 PM, Menno Valkema wrote:
> Hi Sebastian,
> 
> Thank you for thinking along!
> 
> I'm having a hard time getting the address of the lock during
> construction and deconstruction. I tried 2 methods: 1) Calling PERR in
> the constructor and destructor, which somehow does not compile. For now
> I assume this is because the locking code might be too low level to work
> with an advanced feature as PERR (which probably uses a lock itself
> somewhere in the calling process). 2) When I try to set a breakpoint
> after attaching GDB to the process at the problem area near the lock,
> the execution does not halt at the intended breakpoint, but continues
> until it segfaults (or it might not even reach that part of the code).

You need this
(https://github.com/ssumpf/genode/commit/61177e3be9529bc68e8d6a0af1919bd80733c94d)
commit for breakpoints to work on Linux. When using gdb on Linux, you
have to start gdb in the 'bin' directory of your Genode build directory,
this is currently the only way gdb can find all required shared libraries.

You also have to tell gdb the process id (actually the thread) you want
to debug with the -p option, which you can find out with 'ps -efL | grep
Genode'. I wrote a small helper script to simplify things (attached).

The way I use gdb on Linux works like this:

1. Put 'wait_for_continue' somewhere in your code ('main' function or
'construct' function). Simply declare the 'void wait_for_continue(void)'
(C) or 'extern "C" void wait_for_continue()' (C++) function and call it.

2. Start your run script (it has to use 'run_genode_until forever'!)

3. Your program will pause in the 'wait_for_continue' function.

4. On a different shell 'cd <build_dir>/bin'

5. gdp syntax is: 'gdp <binary> <thread in binary - starting with
zero>'. Usually the thread of interest is the second thread (1).

> gdp test-program 1

6. You should see that all required shared libraries are loaded (like
ldso.lib.so). Set your break point, watch point, or anything else and
then enter 'c' for continue.

7. Go back to the shell where you started the scenario and hit <Enter>,
which in turn resumes operation.

8. The system should then stop at the breakpoint, assuming you choose
the right thread. In order to find out if the thread is the right one
use single stepping ('s' or 'n') instead of 'c' and look if you end up
in familiar code.

It also helps to compile your code with -O0, which yields to a more
consistent output when single stepping, while sometimes not triggering
the bug.

Also, make sure all shared libraries are loaded correctly with 'info
shared', if this does not show anything, sometimes the 'share' command
helps.

It took us a while to come up with this solution, but it has helped me a
lot, I hope this helps you as well,

Sebastian

> 
> Also I notice that when looking at the stack trace (using 'bt') GDB now
> seems unable to determine where the segfault happened (no method or
> file), where before it seemed to come from somewhere around
> 'ready_for_rx'. I did manually load the symbol tables using 'set
> solib-search-path bin', so all other entries in the stack trace consist
> of clear debugging information.
> 
> Looking at kern.log I see the following (sp is the same for each
> segfault for this process):
> Aug 31 11:20:52 knuth kernel: [181321.037468] ld.lib.so[25211]: segfault
> at 0 ip           (null) sp 00000000404ffd48 error 14 in
> test_app[1000000+16000]
> 
> Looking at the /proc/N/maps (see full output below), sp seems to point
> so the following entry:
> 
> 404fc000-40500000 rwxs 00000000 fc:01 6565435
> /tmp/genode-1000/ds-114 (deleted)
> 
> Looking at the data being sent out over the Nic_connection (this is a
> local character array consisting of "whoopie", I created for debugging
> purposes instead of 'real' data), points to an address very near to the
> wrong sp address.
> 
> to_send = {_data = 0x404ffd50 "whoopie", _size = 8}
> 
> I'm not fluent enough in the Genode nor Linux loading process to see
> what might be going on here. However it makes sense to get a segfault
> for some 'deleted' entry. It does seem that your previous comment on a
> possible corrupted stack might be right, however there are no obvious
> stack overflows in the code.
> 
> Does anything in this additional information give you an idea what might
> be wrong here?
> 
> Thank you, Menno
> 
> 01000000-01016000 r-xs 00001000 fc:01 6565216
> <project dir>/build/linux_x86/test_app/test_app
> 01016000-0101b000 rwxs 00000000 fc:01 6565416
> /tmp/genode-1000/ds-95 (deleted)
> 0101b000-010f6000 r-xs 00001000 fc:01 6557519
> <project dir>/build/linux_x86/var/libcache/libc/libc.lib.so
> 010f6000-0112d000 rwxs 00000000 fc:01 6565420
> /tmp/genode-1000/ds-99 (deleted)
> 0112d000-01151000 r-xs 00001000 fc:01 6560429
> <project dir>/build/linux_x86/var/libcache/libcsl/libcsl.lib.so
> 01151000-01156000 rwxs 00000000 fc:01 6565423
> /tmp/genode-1000/ds-102 (deleted)
> 01156000-0118a000 r-xs 00001000 fc:01 6564859
> <project dir>/build/linux_x86/var/libcache/lwip/lwip.lib.so
> 0118a000-01194000 rwxs 00000000 fc:01 6565427
> /tmp/genode-1000/ds-106 (deleted)
> 01194000-0b000000 ---p 00000000 00:00 0
> 40000000-400e0000 ---p 00000000 00:00 0
> 400e0000-40100000 rwxs 00000000 fc:01 6565400
> /tmp/genode-1000/ds-82 (deleted)
> 40100000-401e0000 ---p 00000000 00:00 0
> 401e0000-40200000 rwxs 00000000 fc:01 6565403
> /tmp/genode-1000/ds-84 (deleted)
> 40200000-402f8000 ---p 00000000 00:00 0
> 402f8000-40300000 rwxs 00000000 fc:01 6565409
> /tmp/genode-1000/ds-88 (deleted)
> 40300000-403f0000 ---p 00000000 00:00 0
> 403f0000-40400000 rwxs 00000000 fc:01 6565432
> /tmp/genode-1000/ds-111 (deleted)
> 40400000-404fc000 ---p 00000000 00:00 0
> 404fc000-40500000 rwxs 00000000 fc:01 6565435
> /tmp/genode-1000/ds-114 (deleted)
> 40500000-405ff000 ---p 00000000 00:00 0
> 405ff000-40600000 rwxs 00000000 fc:01 6565436
> /tmp/genode-1000/ds-115 (deleted)
> 40600000-406fe000 ---p 00000000 00:00 0
> 406fe000-40700000 rwxs 00000000 fc:01 6565457
> /tmp/genode-1000/ds-123 (deleted)
> 40700000-50000000 ---p 00000000 00:00 0
> 50000000-50100000 rwxp 00000000 00:00 0
> 50100000-5017b000 r-xp 00001000 fc:01 6556675
> <project dir>/build/linux_x86/var/libcache/ld/ld.lib.so
> 5017b000-5018f000 rwxp 0007c000 fc:01 6556675
> <project dir>/build/linux_x86/var/libcache/ld/ld.lib.so
> 5018f000-50249000 rwxp 00000000 00:00 0
> 2b7c3051d000-2b7c30525000 rwxs 00000000 fc:01 6565392
> /tmp/genode-1000/ds-75 (deleted)
> 2b7c30525000-2b7c30527000 rwxs 00000000 fc:01 6565379
> /tmp/genode-1000/ds-63 (deleted)
> 2b7c30527000-2b7c30529000 rwxs 00000000 fc:01 6565379
> /tmp/genode-1000/ds-63 (deleted)
> 2b7c30529000-2b7c3052b000 rwxs 00000000 fc:01 6565379
> /tmp/genode-1000/ds-63 (deleted)
> 2b7c3052b000-2b7c30533000 rwxs 00000000 fc:01 6565413
> /tmp/genode-1000/ds-92 (deleted)
> 2b7c30533000-2b7c30534000 rwxs 00000000 fc:01 6565383
> /tmp/genode-1000/ds-66 (deleted)
> 2b7c30534000-2b7c30544000 rwxs 00000000 fc:01 6565431
> /tmp/genode-1000/ds-110 (deleted)
> 2b7c30544000-2b7c30546000 rwxs 00000000 fc:01 6565379
> /tmp/genode-1000/ds-63 (deleted)
> 2b7c30546000-2b7c30548000 rwxs 00000000 fc:01 6565379
> /tmp/genode-1000/ds-63 (deleted)
> 2b7c30548000-2b7c30568000 rwxs 00000000 fc:01 6565452
> /tmp/genode-1000/ds-119 (deleted)
> 2b7c30568000-2b7c306f8000 rwxs 00000000 fc:01 6565453
> /tmp/genode-1000/ds-120 (deleted)
> 2b7c306f8000-2b7c30888000 rwxs 00000000 fc:01 6565454
> /tmp/genode-1000/ds-121 (deleted)
> 2b7c30888000-2b7c3088a000 rwxs 00000000 fc:01 6565379
> /tmp/genode-1000/ds-63 (deleted)
> 2b7c3088a000-2b7c30a1a000 rwxs 00000000 fc:01 6565460
> /tmp/genode-1000/ds-125 (deleted)
> 2b7c30a1a000-2b7c30baa000 rwxs 00000000 fc:01 6565464
> /tmp/genode-1000/ds-126 (deleted)
> 2b7c30baa000-2b7c30d3a000 rwxs 00000000 fc:01 6565460
> /tmp/genode-1000/ds-125 (deleted)
> 2b7c30d3a000-2b7c30eca000 rwxs 00000000 fc:01 6565464
> /tmp/genode-1000/ds-126 (deleted)
> 7fffd8f3d000-7fffd8f5e000 rwxp 00000000 00:00 0
> 7fffd8f76000-7fffd8f78000 r-xp 00000000 00:00 0
> [vdso]
> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
> [vsyscall]
> 
> 
> 
> 
> On 30-08-16 16:05, Sebastian Sumpf wrote:
>> Hi Menno,
>>
>> On 08/30/2016 03:43 PM, Menno Valkema wrote:
>>> Hi Everyone,
>>>
>>> I'm currently using a Nic_connection to exchange data data between
>>> components on linux_x86.
>>>
>>> Before sending data from the Nic server I first check whether any data
>>> can be freed. Even before the first packet is send out, using the code
>>> below:
>>>
>>> while ( _rx.source()->ack_avail() )
>>> {
>>>   _rx.source()->release_packet( _rx.source()->get_acked_packet() );
>>> }
>>>
>>>
>>> Whenever I'm sending data out from an extern "C" method (callback passed
>>> to a c library), the application crashes. Looking with GDB, the
>>> issues seems to be with the destructor of the lock guard from the code
>>> from packet_stream.h (full GDB output at the bottom of this email).
>>>
>>> bool ready_for_rx()
>>> {
>>> 	Genode::Lock::Guard lock_guard(_rx_queue_lock);
>>> 	return !_rx_queue->empty();
>>> }
>>>
>>> The destructor of the Guard simply calls the unlock method for the lock.
>>> However this crashes. Could it be that the unlock method throws an
>>> exception in the destructor, or that there might be uninitialized
>>> variables within the lock itself?
>>>
>>> I'm sort of lost here, because I've used the Nic_connection in similar
>>> settings in the past (also called from an extern "C" context as a c code
>>> callback). However this time it consistently breaks, whenever I try to
>>> sent out the first packet from an extern "c" context (it does work when
>>> sending the packet out from normal c++ code.
>>>
>>> Any suggestions what might causes the crash in my application?
>>>
>>> Cheers, Menno
>>>
>>>
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x0000000001143c22 in ~Lock_guard (this=<optimized out>,
>>> __in_chrg=<optimized out>)
>>>     at <project-path...>/repos/base/include/base/lock_guard.h:42
>>> 42			~Lock_guard() { _lock.unlock(); }
>>> (gdb) bt
>>> #0  0x0000000001143c22 in ~Lock_guard (this=<optimized out>,
>>> __in_chrg=<optimized out>)
>>>     at <project-path...>/repos/base/include/base/lock_guard.h:42
>>> #1  ready_for_rx (this=<optimized out>) at
>>> <project-path...>/repos/os/include/os/packet_stream.h:400
>>> #2  ack_avail (this=<optimized out>) at
>>> <project-path...>/repos/os/include/os/packet_stream.h:686
>>>
>>
>> You could check the address of _lock in the constructor of Lock_guard
>> and also in the destructor. It might be stack corruption. In case the
>> address remains the same, is it the same as the segmentation fault address?
>>
>> Cheers,
>>
>> Sebastian
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> genode-main mailing list
>> genode-main at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> genode-main mailing list
> genode-main at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/genode-main
> 

-------------- next part --------------
#!/usr/bin/perl

my $p  = `ps -efL | grep "Genode.*$ARGV[0]"`;
my $line = defined $ARGV[1] ? $ARGV[1] : 0;

my @sp = split "\n", $p;

$sp[$line] =~ m/[0-9]+\s+[0-9]+\s+([0-9]+)/;

print " $line id $1\n";

exec "gdb -p $1 $ARGV[0]"