Segfault when calling (Cancelable)Lock.unlock?

Menno Valkema menno.genode at ...9...
Wed Aug 31 14:36:43 CEST 2016


Hi Sebastian,

Thank you for thinking along!

I'm having a hard time getting the address of the lock during
construction and deconstruction. I tried 2 methods: 1) Calling PERR in
the constructor and destructor, which somehow does not compile. For now
I assume this is because the locking code might be too low level to work
with an advanced feature as PERR (which probably uses a lock itself
somewhere in the calling process). 2) When I try to set a breakpoint
after attaching GDB to the process at the problem area near the lock,
the execution does not halt at the intended breakpoint, but continues
until it segfaults (or it might not even reach that part of the code).

Also I notice that when looking at the stack trace (using 'bt') GDB now
seems unable to determine where the segfault happened (no method or
file), where before it seemed to come from somewhere around
'ready_for_rx'. I did manually load the symbol tables using 'set
solib-search-path bin', so all other entries in the stack trace consist
of clear debugging information.

Looking at kern.log I see the following (sp is the same for each
segfault for this process):
Aug 31 11:20:52 knuth kernel: [181321.037468] ld.lib.so[25211]: segfault
at 0 ip           (null) sp 00000000404ffd48 error 14 in
test_app[1000000+16000]

Looking at the /proc/N/maps (see full output below), sp seems to point
so the following entry:

404fc000-40500000 rwxs 00000000 fc:01 6565435
/tmp/genode-1000/ds-114 (deleted)

Looking at the data being sent out over the Nic_connection (this is a
local character array consisting of "whoopie", I created for debugging
purposes instead of 'real' data), points to an address very near to the
wrong sp address.

to_send = {_data = 0x404ffd50 "whoopie", _size = 8}

I'm not fluent enough in the Genode nor Linux loading process to see
what might be going on here. However it makes sense to get a segfault
for some 'deleted' entry. It does seem that your previous comment on a
possible corrupted stack might be right, however there are no obvious
stack overflows in the code.

Does anything in this additional information give you an idea what might
be wrong here?

Thank you, Menno

01000000-01016000 r-xs 00001000 fc:01 6565216
<project dir>/build/linux_x86/test_app/test_app
01016000-0101b000 rwxs 00000000 fc:01 6565416
/tmp/genode-1000/ds-95 (deleted)
0101b000-010f6000 r-xs 00001000 fc:01 6557519
<project dir>/build/linux_x86/var/libcache/libc/libc.lib.so
010f6000-0112d000 rwxs 00000000 fc:01 6565420
/tmp/genode-1000/ds-99 (deleted)
0112d000-01151000 r-xs 00001000 fc:01 6560429
<project dir>/build/linux_x86/var/libcache/libcsl/libcsl.lib.so
01151000-01156000 rwxs 00000000 fc:01 6565423
/tmp/genode-1000/ds-102 (deleted)
01156000-0118a000 r-xs 00001000 fc:01 6564859
<project dir>/build/linux_x86/var/libcache/lwip/lwip.lib.so
0118a000-01194000 rwxs 00000000 fc:01 6565427
/tmp/genode-1000/ds-106 (deleted)
01194000-0b000000 ---p 00000000 00:00 0
40000000-400e0000 ---p 00000000 00:00 0
400e0000-40100000 rwxs 00000000 fc:01 6565400
/tmp/genode-1000/ds-82 (deleted)
40100000-401e0000 ---p 00000000 00:00 0
401e0000-40200000 rwxs 00000000 fc:01 6565403
/tmp/genode-1000/ds-84 (deleted)
40200000-402f8000 ---p 00000000 00:00 0
402f8000-40300000 rwxs 00000000 fc:01 6565409
/tmp/genode-1000/ds-88 (deleted)
40300000-403f0000 ---p 00000000 00:00 0
403f0000-40400000 rwxs 00000000 fc:01 6565432
/tmp/genode-1000/ds-111 (deleted)
40400000-404fc000 ---p 00000000 00:00 0
404fc000-40500000 rwxs 00000000 fc:01 6565435
/tmp/genode-1000/ds-114 (deleted)
40500000-405ff000 ---p 00000000 00:00 0
405ff000-40600000 rwxs 00000000 fc:01 6565436
/tmp/genode-1000/ds-115 (deleted)
40600000-406fe000 ---p 00000000 00:00 0
406fe000-40700000 rwxs 00000000 fc:01 6565457
/tmp/genode-1000/ds-123 (deleted)
40700000-50000000 ---p 00000000 00:00 0
50000000-50100000 rwxp 00000000 00:00 0
50100000-5017b000 r-xp 00001000 fc:01 6556675
<project dir>/build/linux_x86/var/libcache/ld/ld.lib.so
5017b000-5018f000 rwxp 0007c000 fc:01 6556675
<project dir>/build/linux_x86/var/libcache/ld/ld.lib.so
5018f000-50249000 rwxp 00000000 00:00 0
2b7c3051d000-2b7c30525000 rwxs 00000000 fc:01 6565392
/tmp/genode-1000/ds-75 (deleted)
2b7c30525000-2b7c30527000 rwxs 00000000 fc:01 6565379
/tmp/genode-1000/ds-63 (deleted)
2b7c30527000-2b7c30529000 rwxs 00000000 fc:01 6565379
/tmp/genode-1000/ds-63 (deleted)
2b7c30529000-2b7c3052b000 rwxs 00000000 fc:01 6565379
/tmp/genode-1000/ds-63 (deleted)
2b7c3052b000-2b7c30533000 rwxs 00000000 fc:01 6565413
/tmp/genode-1000/ds-92 (deleted)
2b7c30533000-2b7c30534000 rwxs 00000000 fc:01 6565383
/tmp/genode-1000/ds-66 (deleted)
2b7c30534000-2b7c30544000 rwxs 00000000 fc:01 6565431
/tmp/genode-1000/ds-110 (deleted)
2b7c30544000-2b7c30546000 rwxs 00000000 fc:01 6565379
/tmp/genode-1000/ds-63 (deleted)
2b7c30546000-2b7c30548000 rwxs 00000000 fc:01 6565379
/tmp/genode-1000/ds-63 (deleted)
2b7c30548000-2b7c30568000 rwxs 00000000 fc:01 6565452
/tmp/genode-1000/ds-119 (deleted)
2b7c30568000-2b7c306f8000 rwxs 00000000 fc:01 6565453
/tmp/genode-1000/ds-120 (deleted)
2b7c306f8000-2b7c30888000 rwxs 00000000 fc:01 6565454
/tmp/genode-1000/ds-121 (deleted)
2b7c30888000-2b7c3088a000 rwxs 00000000 fc:01 6565379
/tmp/genode-1000/ds-63 (deleted)
2b7c3088a000-2b7c30a1a000 rwxs 00000000 fc:01 6565460
/tmp/genode-1000/ds-125 (deleted)
2b7c30a1a000-2b7c30baa000 rwxs 00000000 fc:01 6565464
/tmp/genode-1000/ds-126 (deleted)
2b7c30baa000-2b7c30d3a000 rwxs 00000000 fc:01 6565460
/tmp/genode-1000/ds-125 (deleted)
2b7c30d3a000-2b7c30eca000 rwxs 00000000 fc:01 6565464
/tmp/genode-1000/ds-126 (deleted)
7fffd8f3d000-7fffd8f5e000 rwxp 00000000 00:00 0
7fffd8f76000-7fffd8f78000 r-xp 00000000 00:00 0
[vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
[vsyscall]




On 30-08-16 16:05, Sebastian Sumpf wrote:
> Hi Menno,
> 
> On 08/30/2016 03:43 PM, Menno Valkema wrote:
>> Hi Everyone,
>>
>> I'm currently using a Nic_connection to exchange data data between
>> components on linux_x86.
>>
>> Before sending data from the Nic server I first check whether any data
>> can be freed. Even before the first packet is send out, using the code
>> below:
>>
>> while ( _rx.source()->ack_avail() )
>> {
>>   _rx.source()->release_packet( _rx.source()->get_acked_packet() );
>> }
>>
>>
>> Whenever I'm sending data out from an extern "C" method (callback passed
>> to a c library), the application crashes. Looking with GDB, the
>> issues seems to be with the destructor of the lock guard from the code
>> from packet_stream.h (full GDB output at the bottom of this email).
>>
>> bool ready_for_rx()
>> {
>> 	Genode::Lock::Guard lock_guard(_rx_queue_lock);
>> 	return !_rx_queue->empty();
>> }
>>
>> The destructor of the Guard simply calls the unlock method for the lock.
>> However this crashes. Could it be that the unlock method throws an
>> exception in the destructor, or that there might be uninitialized
>> variables within the lock itself?
>>
>> I'm sort of lost here, because I've used the Nic_connection in similar
>> settings in the past (also called from an extern "C" context as a c code
>> callback). However this time it consistently breaks, whenever I try to
>> sent out the first packet from an extern "c" context (it does work when
>> sending the packet out from normal c++ code.
>>
>> Any suggestions what might causes the crash in my application?
>>
>> Cheers, Menno
>>
>>
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x0000000001143c22 in ~Lock_guard (this=<optimized out>,
>> __in_chrg=<optimized out>)
>>     at <project-path...>/repos/base/include/base/lock_guard.h:42
>> 42			~Lock_guard() { _lock.unlock(); }
>> (gdb) bt
>> #0  0x0000000001143c22 in ~Lock_guard (this=<optimized out>,
>> __in_chrg=<optimized out>)
>>     at <project-path...>/repos/base/include/base/lock_guard.h:42
>> #1  ready_for_rx (this=<optimized out>) at
>> <project-path...>/repos/os/include/os/packet_stream.h:400
>> #2  ack_avail (this=<optimized out>) at
>> <project-path...>/repos/os/include/os/packet_stream.h:686
>>
> 
> You could check the address of _lock in the constructor of Lock_guard
> and also in the destructor. It might be stack corruption. In case the
> address remains the same, is it the same as the segmentation fault address?
> 
> Cheers,
> 
> Sebastian
> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> genode-main mailing list
> genode-main at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/genode-main
> 




More information about the users mailing list