C++ exceptions handing deadlock with gcc unwind code in dynamic library
Sebastian Sumpf
Sebastian.Sumpf at genode-labs.com
Tue Oct 13 13:28:40 CEST 2020
Hallo Alexander,
On 10/12/20 11:30 PM, Alexander Tormasov via users wrote:
> What I found is a deadlock of recursive Linker::mutex call.
>
> - If we have an exception in some code (e.g. code which call NOVA syscall, in my case this is attach_at() RPC call) then it somehow processed in caller.
> In particular, during processing it call the following stack from injected by gcc function _Unwind_Resume - pay attention to function dl_iterate_phdr():
>
> #0 Linker::mutex () at /home/tor/gen/20.08/repos/base/src/lib/ldso/main.cc:68
> #1 0x0000000000124997 in dl_iterate_phdr (callback=0x119e7a0 <_Unwind_IteratePhdrCallback>, data=0x403fdde0) at /home/tor/gen/20.08/repos/base/src/lib/ldso/exception.cc:41
> #2 0x000000000119fa0f in _Unwind_Find_FDE (pc=0x119dc76 <_Unwind_Resume+54>, bases=bases at entry=0x403fe128) at /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2-fde-dip.c:469
> #3 0x000000000119bfc3 in uw_frame_state_for (context=context at entry=0x403fe080, fs=fs at entry=0x403fdec0) at /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2.c:1257
> #4 0x000000000119cfe0 in uw_init_context_1 (context=context at entry=0x403fe080, outer_cfa=outer_cfa at entry=0x403fe2b0, outer_ra=0x1000bcd <Genode::Region_map::attach_at(Genode::Capability<Genode::Dataspace>, unsigned long, unsigned long, long)+259>) at /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2.c:1586
> #5 0x000000000119dc77 in _Unwind_Resume (exc=0x1b41a8 <Genode::init_cxx_heap(Genode::Env&)::initial_block+5256>) at /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind.inc:235
> #6 0x0000000001000bcd in Genode::Region_map::attach_at (this=0x1304068 <vm_reg0+8648>, ds=..., local_addr=0x80000000, size=0x40000, offset=0x0) at /home/tor/gen/20.08/repos/base/include/region_map/region_map.h:127
>
> The code of dl_iterate_phdr():
> extern "C" int dl_iterate_phdr(int (*callback) (Phdr_info *info, size_t size, void *data), void *data)
> {
> int err = 0;
> Phdr_info info;
>
> Mutex::Guard guard(mutex());
>
> for (Object *e = obj_list_head();e; e = e->next_obj()) {
>
> info.addr = e->reloc_base();
> info.name = e->name();
> info.phdr = e->file()->phdr.phdr;
> info.phnum = e->file()->phdr.count;
>
> if (verbose_exception)
> log(e->name(), " reloc ", Hex(e->reloc_base()));
>
> if ((err = callback(&info, sizeof(Phdr_info), data)))
> break;
> }
>
> return err;
> }
>
> Py attention that it take Linker::_mutex object (lock).
>
> Inside, it call the callback() function for main C++ code which resolved to
> _Unwind_IteratePhdrCallback
> from contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2-fde-dip.c
> which internally call get_fde_encoding() and get_cie_encoding() which contain very simple line
> /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2-fde.c:300
>
> p = aug + strlen ((const char *)aug) + 1; /* Skip the augmentation string. */
>
> strlen() is not inlined/instantiated here.
> In machine code it call strlen at plt which mean that strlen assumed in the shared library, and typically it should be processed by linker relocation code.
>
> To find the code it call jmp_slot at PLT and, in turn,
> call from src/lib/ldso/main.cc:294 function
> Elf::Addr Ld::jmp_slot(Dependency const &dep, Elf::Size index)
> {
> Mutex::Guard guard(mutex());
>
> if (verbose_relocation)
> …
>
> Pay attention that it call the same Linker::_mutex object (lock)
> Voila!
> we have recursive call of the same linker mutex and deadlock in exception processing.
>
> definitely key problem here is in the usage of linker mutex in Genode implementation of dl_iterate_phdr()
>
> So, question: how to fix this?
> May be we need different mutexes for Ld::jmp_slot and for dl_iterate_phdr?
The 'strlen' function should be provided by the cxx library
(repos/base/src/lib/cxx/misc.cc) at link time and this way not produce a
jmp slot (i.e. strlen at plt). So, the problem here is that the jump slot
is created. Is there a way to reproduce this easily?
Regards,
Sebastian
More information about the users
mailing list