C++ exceptions handing deadlock with gcc unwind code in dynamic library

Sebastian Sumpf Sebastian.Sumpf at genode-labs.com
Tue Oct 13 13:28:40 CEST 2020

Hallo Alexander,

On 10/12/20 11:30 PM, Alexander Tormasov via users wrote:
> What I found is a deadlock of recursive Linker::mutex call.
> - If we have an exception in some code (e.g. code which call NOVA syscall, in my case this is attach_at() RPC call) then it somehow processed in caller.
> In particular, during processing it call the following stack from injected by gcc function _Unwind_Resume -  pay attention to function dl_iterate_phdr():
> #0  Linker::mutex () at /home/tor/gen/20.08/repos/base/src/lib/ldso/main.cc:68
> #1  0x0000000000124997 in dl_iterate_phdr (callback=0x119e7a0 <_Unwind_IteratePhdrCallback>, data=0x403fdde0) at /home/tor/gen/20.08/repos/base/src/lib/ldso/exception.cc:41
> #2  0x000000000119fa0f in _Unwind_Find_FDE (pc=0x119dc76 <_Unwind_Resume+54>, bases=bases at entry=0x403fe128) at /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2-fde-dip.c:469
> #3  0x000000000119bfc3 in uw_frame_state_for (context=context at entry=0x403fe080, fs=fs at entry=0x403fdec0) at /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2.c:1257
> #4  0x000000000119cfe0 in uw_init_context_1 (context=context at entry=0x403fe080, outer_cfa=outer_cfa at entry=0x403fe2b0, outer_ra=0x1000bcd <Genode::Region_map::attach_at(Genode::Capability<Genode::Dataspace>, unsigned long, unsigned long, long)+259>) at /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2.c:1586
> #5  0x000000000119dc77 in _Unwind_Resume (exc=0x1b41a8 <Genode::init_cxx_heap(Genode::Env&)::initial_block+5256>) at /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind.inc:235
> #6  0x0000000001000bcd in Genode::Region_map::attach_at (this=0x1304068 <vm_reg0+8648>, ds=..., local_addr=0x80000000, size=0x40000, offset=0x0) at /home/tor/gen/20.08/repos/base/include/region_map/region_map.h:127
> The code of dl_iterate_phdr():
> extern "C" int dl_iterate_phdr(int (*callback) (Phdr_info *info, size_t size, void *data), void *data)
> {
>     int err = 0;
>     Phdr_info info;
>     Mutex::Guard guard(mutex());
>     for (Object *e = obj_list_head();e; e = e->next_obj()) {
>         info.addr  = e->reloc_base();
>         info.name  = e->name();
>         info.phdr  = e->file()->phdr.phdr;
>         info.phnum = e->file()->phdr.count;
>         if (verbose_exception)
>             log(e->name(), " reloc ", Hex(e->reloc_base()));
>         if ((err = callback(&info, sizeof(Phdr_info), data)))
>             break;
>     }
>     return err;
> }
> Py attention that it take Linker::_mutex object (lock).
> Inside, it call the callback() function for main C++ code which resolved to
> _Unwind_IteratePhdrCallback
> from contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2-fde-dip.c
> which internally call get_fde_encoding() and get_cie_encoding() which contain very simple line
> /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2-fde.c:300
>   p = aug + strlen ((const char *)aug) + 1; /* Skip the augmentation string.  */
> strlen() is not inlined/instantiated here.
> In machine code it call strlen at plt which mean that strlen assumed in the shared library, and typically it should be processed by linker relocation code.
> To find the code it call jmp_slot at PLT and, in turn,
> call from src/lib/ldso/main.cc:294 function
> Elf::Addr Ld::jmp_slot(Dependency const &dep, Elf::Size index)
> {
>     Mutex::Guard guard(mutex());
>     if (verbose_relocation)
> Pay attention that it call the same Linker::_mutex object (lock)
> Voila! 
> we have recursive call of the same linker mutex and deadlock in exception processing.
> definitely key problem here is in the usage of linker mutex in Genode implementation of dl_iterate_phdr() 
> So, question: how to fix this?
> May be we need different mutexes for  Ld::jmp_slot and for dl_iterate_phdr?

The 'strlen' function should be provided by the cxx library
(repos/base/src/lib/cxx/misc.cc) at link time and this way not produce a
jmp slot (i.e. strlen at plt). So, the problem here is that the jump slot
is created. Is there a way to reproduce this easily?



More information about the users mailing list