pager in Genode

Fri May 13 11:22:02 CEST 2011

Hello Chen,

> I am trying to understand the memory management and pager implementation
> in Genode.  I am stuck on Rm_client::pager function in
>  rm_session_component.cc. Can somebody explain to me the main idea
> behind it, especially the reverse_lookup function? I must have missed
> some important data structures or designs of the memory management in
> RAM or Rm.  Thanks.

you have just discovered one of the most sophisticated parts of Genode's
core. The code is not easy to explain in a few sentences, but I will try.

1. Unveiling the identity of the thread to blame for a page fault

For each paged entity in the system (on most platforms, this is a
thread), there exists a corresponding 'Pager_object' within core. When a
page fault occurs, the low-level platform-specific page-fault handler
(see 'base-foc/src/base/pager/pager.cc') receives the page fault
information from the kernel and uses this information to find the
'Pager_object' belonging to the faulted thread. If the lookup succeeds,
the virtual function 'Pager_object::pager' gets called. This virtual
function is implemented by 'Rm_client' ('Rm_client' inherits the
interface of 'Pager_object'). So here we are, surfacing right in the
generic code in 'rm_session_component.cc'.

2. Finding the living space of the faulter and his family

The 'Rm_client::pager' function has the mission to find a mapping that
is able to resolve the page fault. Because the 'Rm_client' is a
'Pager_object', the object context of the 'pager' function corresponds
to the faulting thread. To find the mapping, the 'Rm_client' needs the
address-space layout which the faulting thread is using. This
address-space layout is represented by a region-manager session (RM
session). Within core, the implementation of the RM session is the
'Rm_session_component' class. Each Genode process has one RM session
that describes the process' address space. If multiple threads are
executed within the same process, each thread has a dedicated
'Pager_object' but all threads share the same RM session.

3. Inspecting the RM session in desperation for a dataspace

A RM session is like a generalized page table. Similarly to a page table
that references physical pages as memory objects, an RM session
references dataspaces as memory objects. In the normal case, such a
dataspace is backed by a contiguous physical memory area. The 'pager'
function uses the fault address as a key into the region map stored
within the RM session. It thereby obtains a so-called 'Rm_region' object
(analogously to a page-table entry when the MMU looks up page-table
structures).

However, in contrast to a page table entry, an RM region is more
flexible with respect to the size of memory objects as the size of a
dataspace can be any number of pages. Furthermore, when attaching a
dataspace to a RM session, it is possible to make only a part of the
dataspace visible in the address space. This view window is stored
alongside the dataspace reference in the 'Rm_region' object. So now that
we have found the correct 'Rm_region' we have found a mapping, do we? In
principle yes, but we are not satisfied to have found just a valid
mapping. We want to use the best mapping possible.

4. Good mappings, better mappings, the best mapping

A simple implementation of the pager function could just return a
single-page mapping for each page fault. But on architectures with
support for different page sizes, this would pollute the TLB with lots
of entries even for large virtual memory ranges that refer to a
contiguous physical memory area. This would be unfortunate on x86 (where
we would like to use superpages). Worse, on platforms such as Xilinx
Microblaze with support for 6 different page sizes, using only one page
size is a performance killer (speaking about factor 10). So what is the
best mapping? A mapping is described as a source (physical address and
size) and destination (virtual address and size). We search for a pair
of source and destination with the following properties:

* Source and destination must have the same size
* The mapping size must be one of the page sizes supported by the
  platform, as large as possible
* The source must be completely contained in the visible dataspace
  window defined in the 'Rm_region'
* The physical backing store of the source must have the same
  alignment as the destination

To represent the mapping source and destination, the code in
'rm_session_component.cc' uses the 'Fault_area' utility class. The code
starts with considering the maximum source and destination fault area
and successively applies constraints to them. E.g., the source fault
area gets constrained by the dataspace size (see the code in
'reverse_lookup').

5. A RM session is a dataspace is a RM session is a dataspace...

So now, things get sophisticated. RM sessions are not just more general
than page tables with regard to the referred memory objects, but they
can be arbitrarily nested. That is, a RM session can be used as a
dataspace and, thereby, can be attached to another RM session. Such a
nested RM session is called managed dataspace. Revisiting the
'Rm_region' lookup procedure described above, the support for managed
dataspaces explains the iterative call of the 'reverse_lookup' function.
If the 'reverse_lookup' returns a dataspace that is a managed dataspace,
we need to dive into the RM session that describes the layout of the
managed dataspace. Of course, the mapping constrains must be applied at
each stage to find the optimal mapping even when traversing nested
managed dataspaces.

6. No dataspace found - don't lose your head, there may be someone to
   the rescue

In both cases (lookup in the immediate process' RM session or within a
managed dataspace), the lookup for a 'Rm_region' may fail (e.g., if
there is no leaf dataspace attached at the fault address). In this case,
core tries to reflect this condition to the
RM session where the lookup failed by delivering a RM fault signal. This
way, the creator of an RM session can receive a notification each time,
someone tried to access an unpopulated part of the RM session (the
signal handler can be installed by using the 'Rm_session::fault_handler'
function). When such a condition happens, the faulting thread gets
halted and becomes registered at the RM session via the
'Rm_session_component::fault' function. Now, the 'Rm_client' plays the
role of a 'Rm_faulter' (hence, 'Rm_client' inherits the 'Rm_faulter'
interface). Each time, a dataspace gets attached to an RM session, the
list of faulters gets revisited. If there is a faulter that faulted at
an address covered by the new 'Rm_region', the faulter gets woken up.
This way, managed dataspaces can be paged on demand. (an example can be
found at 'base/src/test/rm_fault').

If there is no signal handler registered for an RM session and a lookup
fails, core tells us about an invalid 'Signal-context capability' as you
may have seen before. This message corresponds to a "classical"
segmentation fault.

I hope, this description demystifies the page-fault handling code a bit.
There is a lot of meat in there - it usually takes a while to wrap ones
brain around it.

Cheers
Norman

PS: Chen, could you please subscribe to the mailing list. Otherwise the
list admin will have to explicitly acknowledge each message you are posting.

-- 
Dr.-Ing. Norman Feske
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth