Data space creation and physical memory allocation strategy.

Fri Jun 10 15:48:35 CEST 2011

Hi Chen,

> When a data space is created, physical memory is reserved and cannot be
> used by other processes until the data space is destructed. I wonder if
> it would be a possible situation where physical memory may get wasted.
> For example, an application may first allocate a huge chunk of memory
> (through malloc, which will eventually lead to the creation of one or
> more data spaces), but only touch a small portion of it throughout the
> entire execution. Another example would be to have a large binary where
> a significant portion of the binary are not actually executed. Thus,
> reserving physical memory for these instruction when a program is loaded
> may not be good, especially in a resource-constrained environment. 

you've hit a very good point. Genode does not attempt to pretend an
unlimited amount of physical resources to applications. The assignment,
allocation, and trading of physical resources is always explicit. No
physical resource can be allocated twice at the same time. This way,
Genode guarantees the availability of a physical resource for an
application after a successful allocation. This is not the case for
traditional operating systems, which offer unlimited resources to each
application and implement clever strategies to uphold this illusion as
far as possible. Because the resources are limited, however, this
illusion will inevitably break at some point. Quality of service is
sacrificed to achieve high utilization.

On Genode, the creation of a RAM dataspace implies the reservation of a
physical memory range regardless of how this dataspace is used
afterwards. In the worst case, a huge dataspace may be created, which is
then only sparsely used. This happens in the malloc implementation of
the FreeBSD libc, which uses to issue an anonymous 'mmap' for a large
memory area right at initialization time. On UNIX, this is no problem
because the backing store for this virtual memory area gets allocated on
demand. However, on Genode, the process must have assigned a large RAM
quota in order to create the large dataspace without ever using most of
it. This is a problem, which can be addressed in two different ways, by
avoiding sparsely populated dataspaces or by using managed dataspaces.

For the libc, we "fixed" the problem by replacing the original malloc
implementation by a simple wrapper around Genode's 'env()->heap()',
which has a more conservative backing-store allocation strategy and
avoids the initial allocation of a large chunk of anonymous memory. This
solution serves us well for our current use cases.

However, we expect cases, where the approach of "fixing" the application
does not work, or where over provisioning of resources is a desired
feature. I am thinking about the Noux UNIX emulation environment, which
should allow over provisioning of resources to classical UNIX programs -
in the line of how a real UNIX kernel works. Here is where managed
dataspaces come into play. Each RM session can be on-demand paged by
installing a signal handler using the 'rm_session->fault_handler()'
function. This way, the creator of the RM session can receive a
notification each time a page fault occurs, which is not backed by a
current dataspace attachment. It then can request the fault address and
attach a dataspace to the fault address. Thereby, the page fault gets
resolved and the faulted thread resumes its execution. For an example of
how this works, please refer to:

  base/src/test/rm_fault

This mechanism alone would suffice to execute a child process that is
completely on-demand-paged by the parent (the parent would just need to
install itself as fault handler for the child's RM session). So
classical UNIX semantics could be emulated. But there is another handy
twist to it, namely managed dataspaces.

A managed dataspace is a RM session used as dataspace. The RM session
interface has a 'dataspace' function, which returns a dataspace
capability. This capability can be used to attach the address space
described by the RM session as a region to any other RM session.
Thereby, it becomes possible to employ different on-demand paging
strategies for different parts of the address space. I admit, this
sounds a bit esoteric - so let's better look at a practical example,
which comes in the form of the iso9660 server:

  os/src/server/iso9660

This server offers the content of a CD-ROM as individual ROM sessions.
By simply routing ROM session requests to iso9660 instead of core,
programs can access files from CD-ROM instead of core's boot modules in
a completely transparent manner. A ROM session carries a donation of
just a few KB of memory, which is not enough to hold the content of an
arbitrarily sized file stored on the CD-ROM. The solution is to
implement the ROM service using managed dataspaces. For each client, a
managed dataspace is created. Initially this is not backed by physical
memory at all. But once the client has attached the dataspace and starts
accessing it, page faults come in. The iso9660 handles those page faults
by reading the corresponding blocks from the block device into a RAM
dataspace (backing store) and attaching the part of the backing store at
the fault address to the managed dataspace. If the backing store becomes
crowded, the iso9660 makes room by detaching the evicted parts from the
managed dataspace. Consequently, the client can open files of any size.
The block device access follows the access pattern of the client. For
the client, the underlying mechanism remains completely transparent.

Another use case of managed dataspaces is the handling of the thread
context area. You may also take a look at the following example:

  base/src/test/rm_nested

> Does anybody know whether Linux has a similar mechanism (i.e. reserve a
> chunk of physical memory for a virtual memory region before the first
> access to this region)? If not, what are the pros and cons of these two
> strategies?

I am not aware of a Linux mechanism for reserving physical memory. But
as Linux is used for real-time applications, I expect that there exists
a mechanism. However, I think that there is no equivalent to managed
dataspaces on Linux. The paging strategy resides in the kernel.

Best regards
Norman

-- 
Dr.-Ing. Norman Feske
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth