Hello Chen,
I am trying to understand the memory management and pager implementation in Genode. I am stuck on Rm_client::pager function in rm_session_component.cc. Can somebody explain to me the main idea behind it, especially the reverse_lookup function? I must have missed some important data structures or designs of the memory management in RAM or Rm. Thanks.
you have just discovered one of the most sophisticated parts of Genode's core. The code is not easy to explain in a few sentences, but I will try.
1. Unveiling the identity of the thread to blame for a page fault
For each paged entity in the system (on most platforms, this is a thread), there exists a corresponding 'Pager_object' within core. When a page fault occurs, the low-level platform-specific page-fault handler (see 'base-foc/src/base/pager/pager.cc') receives the page fault information from the kernel and uses this information to find the 'Pager_object' belonging to the faulted thread. If the lookup succeeds, the virtual function 'Pager_object::pager' gets called. This virtual function is implemented by 'Rm_client' ('Rm_client' inherits the interface of 'Pager_object'). So here we are, surfacing right in the generic code in 'rm_session_component.cc'.
2. Finding the living space of the faulter and his family
The 'Rm_client::pager' function has the mission to find a mapping that is able to resolve the page fault. Because the 'Rm_client' is a 'Pager_object', the object context of the 'pager' function corresponds to the faulting thread. To find the mapping, the 'Rm_client' needs the address-space layout which the faulting thread is using. This address-space layout is represented by a region-manager session (RM session). Within core, the implementation of the RM session is the 'Rm_session_component' class. Each Genode process has one RM session that describes the process' address space. If multiple threads are executed within the same process, each thread has a dedicated 'Pager_object' but all threads share the same RM session.
3. Inspecting the RM session in desperation for a dataspace
A RM session is like a generalized page table. Similarly to a page table that references physical pages as memory objects, an RM session references dataspaces as memory objects. In the normal case, such a dataspace is backed by a contiguous physical memory area. The 'pager' function uses the fault address as a key into the region map stored within the RM session. It thereby obtains a so-called 'Rm_region' object (analogously to a page-table entry when the MMU looks up page-table structures).
However, in contrast to a page table entry, an RM region is more flexible with respect to the size of memory objects as the size of a dataspace can be any number of pages. Furthermore, when attaching a dataspace to a RM session, it is possible to make only a part of the dataspace visible in the address space. This view window is stored alongside the dataspace reference in the 'Rm_region' object. So now that we have found the correct 'Rm_region' we have found a mapping, do we? In principle yes, but we are not satisfied to have found just a valid mapping. We want to use the best mapping possible.
4. Good mappings, better mappings, the best mapping
A simple implementation of the pager function could just return a single-page mapping for each page fault. But on architectures with support for different page sizes, this would pollute the TLB with lots of entries even for large virtual memory ranges that refer to a contiguous physical memory area. This would be unfortunate on x86 (where we would like to use superpages). Worse, on platforms such as Xilinx Microblaze with support for 6 different page sizes, using only one page size is a performance killer (speaking about factor 10). So what is the best mapping? A mapping is described as a source (physical address and size) and destination (virtual address and size). We search for a pair of source and destination with the following properties:
* Source and destination must have the same size * The mapping size must be one of the page sizes supported by the platform, as large as possible * The source must be completely contained in the visible dataspace window defined in the 'Rm_region' * The physical backing store of the source must have the same alignment as the destination
To represent the mapping source and destination, the code in 'rm_session_component.cc' uses the 'Fault_area' utility class. The code starts with considering the maximum source and destination fault area and successively applies constraints to them. E.g., the source fault area gets constrained by the dataspace size (see the code in 'reverse_lookup').
5. A RM session is a dataspace is a RM session is a dataspace...
So now, things get sophisticated. RM sessions are not just more general than page tables with regard to the referred memory objects, but they can be arbitrarily nested. That is, a RM session can be used as a dataspace and, thereby, can be attached to another RM session. Such a nested RM session is called managed dataspace. Revisiting the 'Rm_region' lookup procedure described above, the support for managed dataspaces explains the iterative call of the 'reverse_lookup' function. If the 'reverse_lookup' returns a dataspace that is a managed dataspace, we need to dive into the RM session that describes the layout of the managed dataspace. Of course, the mapping constrains must be applied at each stage to find the optimal mapping even when traversing nested managed dataspaces.
6. No dataspace found - don't lose your head, there may be someone to the rescue
In both cases (lookup in the immediate process' RM session or within a managed dataspace), the lookup for a 'Rm_region' may fail (e.g., if there is no leaf dataspace attached at the fault address). In this case, core tries to reflect this condition to the RM session where the lookup failed by delivering a RM fault signal. This way, the creator of an RM session can receive a notification each time, someone tried to access an unpopulated part of the RM session (the signal handler can be installed by using the 'Rm_session::fault_handler' function). When such a condition happens, the faulting thread gets halted and becomes registered at the RM session via the 'Rm_session_component::fault' function. Now, the 'Rm_client' plays the role of a 'Rm_faulter' (hence, 'Rm_client' inherits the 'Rm_faulter' interface). Each time, a dataspace gets attached to an RM session, the list of faulters gets revisited. If there is a faulter that faulted at an address covered by the new 'Rm_region', the faulter gets woken up. This way, managed dataspaces can be paged on demand. (an example can be found at 'base/src/test/rm_fault').
If there is no signal handler registered for an RM session and a lookup fails, core tells us about an invalid 'Signal-context capability' as you may have seen before. This message corresponds to a "classical" segmentation fault.
I hope, this description demystifies the page-fault handling code a bit. There is a lot of meat in there - it usually takes a while to wrap ones brain around it.
Cheers Norman
PS: Chen, could you please subscribe to the mailing list. Otherwise the list admin will have to explicitly acknowledge each message you are posting.