Hi all,
I'm struggling with modifying the Ram_dataspace in base-linux. I have replaced the file descriptor opened in
void Ram_dataspace_factory::_export_ram_ds(Dataspace_component *ds)
in base-linux/src/core/ram_daraspace_support.cc by opening a special device file that provides mappable ram.
I have attached the exact diff that causes the segfault. The kernel module that provides the device file and the ioctl is available at [1]. I have tested it on a Linux system and mapping memory and reading/writing it worked flawlessly.
With this change each component creates a segmentation fault. I was not able to trace it, I couldn't even find out in which file it occurs. The instruction pointer randomly appeared in repos/base/src/lib/base/lock.cc or in repos/base/src/lib/base/heap.cc. I have touched neither of these files.
Is there any information about where this is used exactly? I was unable to trace the usage of the mapped memory beyond Region_map_mmap::attach. I have also traced the calls inside the kernel module and it looks as it only uses open, mmap and close so there shouldn't a problem. I'd like to give you further information about this but I have absolutely no idea by myself.
Regards, Johannes
[1]: https://github.com/jklmnn/hwiodev/blob/0f5d367162812ee015434e5b97fdf208ef362...
Hi Johannes,
On 23.03.2018 18:19, Johannes Kliemann wrote:
I have attached the exact diff that causes the segfault. The kernel module that provides the device file and the ioctl is available at [1]. I have tested it on a Linux system and mapping memory and reading/writing it worked flawlessly.
have you tried to add this test directly in '_export_ram_ds' inside core. That is, 'mmap'ing it locally, writing 'ds->size' number of bytes, reading them again, followed by clearing them. Just to be sure.
With this change each component creates a segmentation fault. I was not able to trace it, I couldn't even find out in which file it occurs. The instruction pointer randomly appeared in repos/base/src/lib/base/lock.cc or in repos/base/src/lib/base/heap.cc. I have touched neither of these files.
RAM dataspaces are used as backing store for any dynamically allocated memory. So it is not surprising to see this fault in the heap. The segmentation fault in the lock implementation is certainly caused by a lock that resides within a dynamically allocated object.
To get a picture where RAM dataspaces are allocated, you may instrument the client side of the PD session interface, i.e., placing a log message in base/include/pd_session/client.h inside the 'alloc' method.
Is there any information about where this is used exactly? I was unable to trace the usage of the mapped memory beyond Region_map_mmap::attach. I have also traced the calls inside the kernel module and it looks as it only uses open, mmap and close so there shouldn't a problem.
I suspect that 'mmap' does not work correctly for some reason (wrong size calculation, or a permission issue?). Maybe you can print the local addresses returned by the 'lx_mmap' calls in 'region_map_mmap' and correlate the fault addresses (as printed by the kernel) with the instrumentation output? Also, you may try placing the access test (the one I proposed for core above) here to let the RAM-dataspace-using component fail predictably instead of randomly, and investigate from there.
Good luck! Norman
Hi Norman,
have you tried to add this test directly in '_export_ram_ds' inside core. That is, 'mmap'ing it locally, writing 'ds->size' number of bytes, reading them again, followed by clearing them. Just to be sure.
Yes I did. I also added such a test in _map_local and in both cases I could read and write the whole memory.
RAM dataspaces are used as backing store for any dynamically allocated memory. So it is not surprising to see this fault in the heap. The segmentation fault in the lock implementation is certainly caused by a lock that resides within a dynamically allocated object.
Could this have to do with using a vfork like mechanism instead of fork? So that the Capability is passed but the mapping doesn't exist any more for the child? I could verify that the segmentation fault only appears after clone in the child.
I was able to create an environment to use gdb. The segfault happens always in repos/base/src/lib/base/heap.cc:217 which is the begin of bool Heap::alloc(size_t size, void **out_addr). out_addr had the value of 0x7fffe7ffe018 and when I tried to get a backtrace gdb said that it cannot access 0x7fffe7ffe008. Both ranges seem to reside in 0x7fffe7ffe000 which is the return value of the first lx_mmap after clone (unfortunately I cannot tell if this happens in the parent or in the child).
Thread 2.1 "ld.lib.so" received signal SIGSEGV, Segmentation fault. 0x00007fffdff47732 in Genode::Heap::alloc ( this=0x7fffe00945e0 <Genode::Heap* unmanaged_singleton<Genode::Heap, 8, Genode::Pd_session*, Genode::Region_map*, Genode::Heap::{unnamed type#1}, char (&) [8192], unsigned long>(Genode::Pd_session*&&, Genode::Region_map*&&, Genode::Heap::{unnamed type#1}&&, char (&) [8192], unsigned long&&)::object_space>, size=140, out_addr=0x7fffe7ffe018) at /media/sf_kernel/genode/repos/base/src/lib/base/heap.cc:217 217 { (gdb) bt #0 0x00007fffdff47732 in Genode::Heap::alloc ( this=0x7fffe00945e0 <Genode::Heap* unmanaged_singleton<Genode::Heap, 8, Genode::Pd_session*, Genode::Region_map*, Genode::Heap::{unnamed type#1}, char (&) [8192], unsigned long>(Genode::Pd_session*&&, Genode::Region_map*&&, Genode::Heap::{unnamed type#1}&&, char (&) [8192], unsigned long&&)::object_space>, size=140, out_addr=0x7fffe7ffe018) at /media/sf_kernel/genode/repos/base/src/lib/base/heap.cc:217 Backtrace stopped: Cannot access memory at address 0x7fffe7ffe008
I still cannot tell what happens there but this doesn't seem to be component specific. I created a config with only a single component that only prints a single log message and the problem occurs.
I hope this information helps you, and thank you for your help, Johannes
Hello Jonhannes,
RAM dataspaces are used as backing store for any dynamically allocated memory. So it is not surprising to see this fault in the heap. The segmentation fault in the lock implementation is certainly caused by a lock that resides within a dynamically allocated object.
Could this have to do with using a vfork like mechanism instead of fork? So that the Capability is passed but the mapping doesn't exist any more for the child? I could verify that the segmentation fault only appears after clone in the child.
the problem could very well be related to the process-creation mechanism. All processes are created by forking from core (via clone). Look out for 'lx_create_process'. Dataspaces are normally attached via the 'MAP_SHARED' flag passed to 'mmap'. May it be that the mmap handling of your kernel module somehow misses to consider this flag? (not that I would know how to handle it)
Both ranges seem to reside in 0x7fffe7ffe000 which is the return value of the first lx_mmap after clone (unfortunately I cannot tell if this happens in the parent or in the child).
To get a clearer picture, an instrumentation as illustrated in the attached patch can be helpful. The 'wait_for_continue' function waits until the user presses return. By placing it in the process-creation code path, you can probe the state at various points. By printing the return value of 'lx_gettid()' in your messages, you can see who is talking. The 'wait_for_continue' hook also provides you with a convenient way to attach GDB *before* the interesting code parts are executed. By using the 'Genode::raw' function instead of 'Genode::log' for log output, messages go directly to the kernel, not to core's LOG service. So you can instrument lowest-level details (like IPC and process creation) without fearing any deadlocks.
In order to let the key presses reach Genode, you need to change the run script to use 'run_genode_until forever', which puts the expect tool in interactive mode. The attached patch changes the 'run/log.run' script as an example.
The instrumentations with 'wait_for_continue' also allow you to inspect '/proc/<PID>/maps' of core and init at various stages of the process creation. Thereby you can derive the information about the backing store for virtual-address areas. E.g., assuming that a mapping is missing in init (the forked process) right after clone - eventually producing a segmentation fault, you can look into the 'maps' file of core to see the origin (file) of the original mapping at the fault address.
I hope that these debugging hints are of help for your further investigation.
Cheers Norman