Hi all,
I'm working on debugging some code, and going along with that, the qemu remote debugging turned out to be really nasty concerning an elfweaver-merged image.
I have a small elf image (built with elfweaver due to the OKL stuff), and this elf image contains a dde driver I've written.
Originally, my driver code was linked to a standard base address like any other executable built with the genode buildsystem.
-- kamikaze@...22...:~/okl4/okl4_2.1.1-fix.7/genode-okl4-x86/bin$ readelf -S test-dde_linux26_net There are 30 section headers, starting at offset 0x46b030:
Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 02000000 001000 06c1cc 00 AX 0 0 128 [ 2] .altinstr_replace PROGBITS 0206c1cc 06d1cc 00011d 00 AX 0 0 1 [ 3] .sched.text PROGBITS 0206c2f0 06d2f0 000f26 00 AX 0 0 16 [ 4] .kprobes.text PROGBITS 0206d216 06e216 000058 00 AX 0 0 1 [ 5] .spinlock.text PROGBITS 0206d26e 06e26e 0001e7 00 AX 0 0 1 [ 6] .fixup PROGBITS 0206d455 06e455 00002a 00 AX 0 0 1 [ 7] __ex_table PROGBITS 0206d47f 06e47f 000130 00 A 0 0 1 [ 8] .data PROGBITS 0206e000 06f000 009cc8 00 WA 0 0 128 [ 9] .got PROGBITS 02077cc8 078cc8 000088 04 WA 0 0 4 [10] .got.plt PROGBITS 02077d50 078d50 00000c 04 WA 0 0 4 [11] .init.setup PROGBITS 02077d5c 078d5c 000030 00 WA 0 0 4 [12] .eh_frame PROGBITS 02077d8c 078d8c 006c2c 00 A 0 0 4 [13] .gcc_except_table PROGBITS 0207e9b8 07f9b8 0012e0 00 A 0 0 4 [14] .altinstructions PROGBITS 0207fc98 080c98 000473 00 A 0 0 4 [15] .smp_locks PROGBITS 0208010c 08110c 0004ec 00 A 0 0 4 [16] __param PROGBITS 020805f8 0815f8 000208 00 A 0 0 4 [17] .bss NOBITS 02080800 081800 01e4e0 00 WA 0 0 128 [18] .debug_abbrev PROGBITS 00000000 081800 0236a2 00 0 0 1 [19] .debug_info PROGBITS 00000000 0a4ea2 2be0b6 00 0 0 1 [20] .debug_line PROGBITS 00000000 362f58 03d9ac 00 0 0 1 [21] .debug_frame PROGBITS 00000000 3a0904 01ff78 00 0 0 4 [22] .debug_loc PROGBITS 00000000 3c087c 06426e 00 0 0 1 [23] .debug_pubnames PROGBITS 00000000 424aea 015b4b 00 0 0 1 [24] .debug_aranges PROGBITS 00000000 43a638 003858 00 0 0 8 [25] .debug_str PROGBITS 00000000 43de90 029bf7 01 MS 0 0 1 [26] .debug_ranges PROGBITS 00000000 467a88 003460 00 0 0 8 [27] .shstrtab STRTAB 00000000 46aee8 000147 00 0 0 1 [28] .symtab SYMTAB 00000000 46b4e0 011760 10 29 1788 4 [29] .strtab STRTAB 00000000 47cc40 01b474 00 0 0 1 --
Upon merging all the binaries with elfweaver, only the base address of 'core' remained almost as it was linked originally. My driver instead was re-linked, so that its sections didn't overlap with any other.
-- Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0
[10] core.text PROGBITS 02000000 04b000 033590 00 AX 0 0 32 [11] core.data PROGBITS 02034000 07f000 000ca4 00 WA 0 0 32 [12] core.got PROGBITS 02034ca4 07fca4 000088 04 WA 0 0 4 [13] core.got.plt PROGBITS 02034d2c 07fd2c 00000c 04 WA 0 0 4 [14] core.eh_frame PROGBITS 02034d38 07fd38 007c88 00 A 0 0 4 [15] core.gcc_except_t PROGBITS 0203c9c0 0879c0 001858 00 A 0 0 4 [16] core.bss NOBITS 0203e220 089218 01a570 00 WA 0 0 32
[21] test-dde_linux26_ PROGBITS 00c00000 2ff000 4980b4 00 WA 0 0 1 [22] bootinfo PROGBITS 04004000 798000 001000 00 WA 0 0 1 [23] .shstrtab STRTAB 00000000 799000 000116 00 0 0 1 --
When I now start to remote-debug my stuff...
-- add-symbol-file /home/kamikaze/okl4/okl4_2.1.1-fix.7/genode-okl4-x86/bin/test-dde_linux26_net 0x00c00000 --
...and set a breakpoint in dde_thread_main...
-- (gdb) b dde_thread_main Breakpoint 2 at 0xc0183e: file /home/kamikaze/genode/sandbox/src/test/dde_linux26_net/main.cc, line 156. (gdb) c --
...this breakpoint is never triggered, although the thread obviously passes it.
So I wonder what init does when it starts up my code. Does my program's virtual address space look like the elfweaver-merged stuff?
Or does it look like the original (how would that work)?
Or does init even choose another base address? How would I then introduce the symbol-file to gdb?
Thanx in advance
Sven -- Sven Fülster
Hello Sven,
Sven Fülster wrote:
(gdb) b dde_thread_main Breakpoint 2 at 0xc0183e: file /home/kamikaze/genode/sandbox/src/test/dde_linux26_net/main.cc, line 156. (gdb) c --
...this breakpoint is never triggered, although the thread obviously passes it.
So I wonder what init does when it starts up my code. Does my program's virtual address space look like the elfweaver-merged stuff?
Or does it look like the original (how would that work)?
we use elfweaver in a rather unconventional way to treat all binaries except core as plain data. As you may have noticed, we use only one PD declaration (in our example, it is named "modules") that serves as a container for all boot modules provided by core's ROM service. Elfweaver regards the specified files as plain binary data and just concatenates them in the resulting single-image. When core starts up, core looks for a memsection called "init" and ELF-loads the data contained in this memsection. During the construction of the new process, core creates the address space for init according to ELF information found in the "init" binary. Once init starts up, it does the same procedure for all files specified in its config file. Both core and init contain an ELF loader.
In your case, the address 0xc00000 is just the core-local address to which the boot loader loaded the data blob (dunno why the section is called test-dde_linux26_). It only has a meaning within core and, consequently, setting a breakpoint to that address range has no effect. Instead you will need to set the breakpoint to the virtual address of your program starting at the virtual address 0x2000000. However, each program is linked to the same virtual address (defined in base-okl4/mk/spec-okl4_x86.mk). So the use of breakpoints when executing multiple processes may still be cumbersome because of the aliasing of the processes' virtual address spaces.
Or does init even choose another base address? How would I then introduce the symbol-file to gdb?
Init chooses the base address as found in the ELF header of your program. Have you already tried using the original link address?
BTW, if you succeed in using qemu for debugging, would you like to write up your experience as a Wiki page at genode.org? I think that your experience could be very valuable for other developers as well.
,-)
Regards Norman
Hi Norman,
we use only one PD declaration (in our example, it is named "modules") that serves as a container for all boot modules provided by core's ROM service. Elfweaver regards the specified files as plain binary data and just concatenates them in the resulting single-image.
Ok, I've wondered how the original linkaddress would be preserved.
Instead you will need to set the breakpoint to the virtual address of your program starting at the virtual address 0x2000000. However, each program is linked to the same virtual address (defined in base-okl4/mk/spec-okl4_x86.mk). So the use of breakpoints when executing multiple processes may still be cumbersome because of the aliasing of the processes' virtual address spaces.
That was a nasty experience when I worked in the l4ka-pistachio case without elfweaver. The gdbserver just sends virtual addresses to the host, where gdb maps them to the currently loaded binary - regardless of the actual running process, indeed.
Init chooses the base address as found in the ELF header of your program. Have you already tried using the original link address?
I've just tried loading the symbol file into gdb without providing an extra address and now the behaviour in the elfweaver case is the same as in the l4ka case.
BTW, if you succeed in using qemu for debugging, would you like to write up your experience as a Wiki page at genode.org? I think that your experience could be very valuable for other developers as well.
,-)
Don't consider me so smart that this could be really helpful :)
There are a few ideas so far:
1. Assume the case you have a breakpoint in main. When booting the system, this breakpoint will be triggered several times, and you will have to check each time if you're in the right process (have you seen a printf telling you that your binary is started right now? Does the stack trace make sense? When you see that printf calls dde_thread_main, that could be suspicious. Do your local variables contain sensible data?
This is not really annoying. Things become nasty as soon you start to step through your code and suddenly another thread is scheduled.
If you step by machine instructions rather than by source line, the former won't happen because in this mode, interrupts are disabled (stepping by source line corresponds to a continuation to a next breakpoint). But this is boring when you want to step over more than a few instructions.
2. Enhance the gdbserver
3. Provide a special linkaddress for that program you want to debug just at build time. I think this is the way I'll try out for a start.
And yes, I don't think we'd keep our 'best practices' a secret as soon as we've learned them :)
Thank you for the expeditious explanation.
Sven -- Sven Fülster
Hi Norman,
my nice plan for supporting debugging by linking my executable to another address failed.
I've added
-- LD_SCRIPT = $(call select_from_repositories,src/platform/genode.ld) CXX_LINK_OPT = -static -nostdlib -Wl,-nostdlib CXX_LINK_OPT += -Wl,-T$(LD_SCRIPT) -Wl,-Ttext=0x01000000 -- to its target.mk. That worked fine.
-- kamikaze@...22...:~/okl4/okl4_2.1.1-fix.7/genode-okl4-x86$ readelf -S bin/test-dde_linux26_net There are 30 section headers, starting at offset 0x4ec040:
Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 01000000 082000 06c1cc 00 AX 0 0 128 [ 2] .altinstr_replace PROGBITS 0106c1cc 0ee1cc 00011d 00 AX 0 0 1 [ 3] .sched.text PROGBITS 0106c2f0 0ee2f0 000f26 00 AX 0 0 16 [ 4] .kprobes.text PROGBITS 0106d216 0ef216 000058 00 AX 0 0 1 [ 5] .spinlock.text PROGBITS 0106d26e 0ef26e 0001e7 00 AX 0 0 1 [ 6] .fixup PROGBITS 0106d455 0ef455 00002a 00 AX 0 0 1 [ 7] __ex_table PROGBITS 0106d47f 0ef47f 000130 00 A 0 0 1 [ 8] .data PROGBITS 0106e000 0f0000 009cd0 00 WA 0 0 128 [ 9] .got PROGBITS 01077cd0 0f9cd0 000088 04 WA 0 0 4 [10] .got.plt PROGBITS 01077d58 0f9d58 00000c 04 WA 0 0 4 [11] .init.setup PROGBITS 01077d64 0f9d64 000030 00 WA 0 0 4 [12] .eh_frame PROGBITS 01077d94 0f9d94 006c30 00 A 0 0 4 [13] .gcc_except_table PROGBITS 0107e9c4 1009c4 0012e0 00 A 0 0 4 [14] .altinstructions PROGBITS 0107fca4 101ca4 000473 00 A 0 0 4 [15] .smp_locks PROGBITS 01080118 102118 0004ec 00 A 0 0 4 [16] __param PROGBITS 01080604 102604 000208 00 A 0 0 4 [17] .bss NOBITS 01080880 10280c 01e4e0 00 WA 0 0 128 [18] .debug_abbrev PROGBITS 00000000 10280c 0236a2 00 0 0 1 [19] .debug_info PROGBITS 00000000 125eae 2be0b6 00 0 0 1 [20] .debug_line PROGBITS 00000000 3e3f64 03d9ac 00 0 0 1 [21] .debug_frame PROGBITS 00000000 421910 01ff78 00 0 0 4 [22] .debug_loc PROGBITS 00000000 441888 06426e 00 0 0 1 [23] .debug_pubnames PROGBITS 00000000 4a5af6 015b4b 00 0 0 1 [24] .debug_aranges PROGBITS 00000000 4bb648 003858 00 0 0 8 [25] .debug_str PROGBITS 00000000 4beea0 029bf7 01 MS 0 0 1 [26] .debug_ranges PROGBITS 00000000 4e8a98 003460 00 0 0 8 [27] .shstrtab STRTAB 00000000 4ebef8 000147 00 0 0 1 [28] .symtab SYMTAB 00000000 4ec4f0 011760 10 29 1788 4 [29] .strtab STRTAB 00000000 4fdc50 01b474 00 0 0 1 --
But then, the mapping of .text and .data sections failed during startup. The same happened when I tried 0x04000000 as base address.
-- [init] Genode::addr_t _setup_elf(Genode::Parent_capability, Genode::Dataspace_capability, Genode::Ram_session&, Genode::Rm_session&): addresses differ after attach (addr=1000000 out_ptr=0) [init] Genode::addr_t _setup_elf(Genode::Parent_capability, Genode::Dataspace_capability, Genode::Ram_session&, Genode::Rm_session&): addresses differ after attach (addr=106e000 out_ptr=0) Start thread ip=10545d4 sp=0, pd=4, tid=0 no RM attachment (READ pf_addr=0 pf_ip=10545d4 from 04) --
Is there anything else I must know about the startup process?
Sven -- Sven Fülster
Hi Sven,
I've added
-- LD_SCRIPT = $(call select_from_repositories,src/platform/genode.ld) CXX_LINK_OPT = -static -nostdlib -Wl,-nostdlib CXX_LINK_OPT += -Wl,-T$(LD_SCRIPT) -Wl,-Ttext=0x01000000 -- to its target.mk. That worked fine.
It is actually sufficient to specify:
CXX_LINK_OPT += -Wl,-Ttext=0x01000000
Please note that this way, the -Ttext argument will appear twice (once with your custom value and once with the original value) at the linker command line. You can see this when building with 'make VERBOSE='. However, fortunately, the linker seems to use the value of the first occurrence, which is the custom value specified in your 'target.mk'.
But then, the mapping of .text and .data sections failed during startup. The same happened when I tried 0x04000000 as base address.
-- [init] Genode::addr_t _setup_elf(Genode::Parent_capability, Genode::Dataspace_capability, Genode::Ram_session&, Genode::Rm_session&): addresses differ after attach (addr=1000000 out_ptr=0) [init] Genode::addr_t _setup_elf(Genode::Parent_capability, Genode::Dataspace_capability, Genode::Ram_session&, Genode::Rm_session&): addresses differ after attach (addr=106e000 out_ptr=0) Start thread ip=10545d4 sp=0, pd=4, tid=0 no RM attachment (READ pf_addr=0 pf_ip=10545d4 from 04) --
Is there anything else I must know about the startup process?
Normally, this output occurs if there are region conflicts. The ELF loader specifies the virtual address to which the segment should be mapped. If there already is a region that overlaps with the specified address range, core refuses attach the dataspace. However, it is not clear to me what could cause the conflict here. When comparing your last output of readelf with the output you posted yesterday, there are several differences apart from the changed base address. Is this the same program? One strange thing is the file offset of your text segment. In you last output, the offset is 0x82000 and there is no section using the file offsets below that value. In your yesterday's output, the file offset for the text segement has a reasonable value of 0x1000.
I have tried the link address 0x1000000 for the init binary (loaded by core) and the link address 0x3000000 for launchpad (loaded by init) and both programs are loaded just fine. I also tried changing the link address of test-dde_linux26_usbhid to 0x1000000 (to see if the issue is related to DDE) and the result gets loaded without problems as well. In all cases, the file offset of the text segment is 0x1000.
Could you try changing the link address for a simple binary (e.g., init) first? - just to see if this issue is specific for your particular program.
BTW, are you using the Genode SVN or the last snapshot?
Regards Norman
Hi Norman,
I was using rev. 58, just updated to rev. 61, but that did not help. I'm using the SVN but not always the newest revision. For linux_drivers, I have a private repository where I persist my own work.
I have then changed the CXX_LINK_OPT variable according to your suggestion and apparently that was the point. Now everything works fine. Perhaps I forgot some important linker option or your linker script was not found from the location I moved the call to. That might also explain the weird offset for the .text section.
Debugging seems to be nice now. Perhaps it was really so easy - we'll see what happens the next days. You have a nice buildsystem.
Thank you
Sven -- Sven Fülster