Hi Chen,
When a data space is created, physical memory is reserved and cannot be used by other processes until the data space is destructed. I wonder if it would be a possible situation where physical memory may get wasted. For example, an application may first allocate a huge chunk of memory (through malloc, which will eventually lead to the creation of one or more data spaces), but only touch a small portion of it throughout the entire execution. Another example would be to have a large binary where a significant portion of the binary are not actually executed. Thus, reserving physical memory for these instruction when a program is loaded may not be good, especially in a resource-constrained environment.
you've hit a very good point. Genode does not attempt to pretend an unlimited amount of physical resources to applications. The assignment, allocation, and trading of physical resources is always explicit. No physical resource can be allocated twice at the same time. This way, Genode guarantees the availability of a physical resource for an application after a successful allocation. This is not the case for traditional operating systems, which offer unlimited resources to each application and implement clever strategies to uphold this illusion as far as possible. Because the resources are limited, however, this illusion will inevitably break at some point. Quality of service is sacrificed to achieve high utilization.
On Genode, the creation of a RAM dataspace implies the reservation of a physical memory range regardless of how this dataspace is used afterwards. In the worst case, a huge dataspace may be created, which is then only sparsely used. This happens in the malloc implementation of the FreeBSD libc, which uses to issue an anonymous 'mmap' for a large memory area right at initialization time. On UNIX, this is no problem because the backing store for this virtual memory area gets allocated on demand. However, on Genode, the process must have assigned a large RAM quota in order to create the large dataspace without ever using most of it. This is a problem, which can be addressed in two different ways, by avoiding sparsely populated dataspaces or by using managed dataspaces.
For the libc, we "fixed" the problem by replacing the original malloc implementation by a simple wrapper around Genode's 'env()->heap()', which has a more conservative backing-store allocation strategy and avoids the initial allocation of a large chunk of anonymous memory. This solution serves us well for our current use cases.
However, we expect cases, where the approach of "fixing" the application does not work, or where over provisioning of resources is a desired feature. I am thinking about the Noux UNIX emulation environment, which should allow over provisioning of resources to classical UNIX programs - in the line of how a real UNIX kernel works. Here is where managed dataspaces come into play. Each RM session can be on-demand paged by installing a signal handler using the 'rm_session->fault_handler()' function. This way, the creator of the RM session can receive a notification each time a page fault occurs, which is not backed by a current dataspace attachment. It then can request the fault address and attach a dataspace to the fault address. Thereby, the page fault gets resolved and the faulted thread resumes its execution. For an example of how this works, please refer to:
base/src/test/rm_fault
This mechanism alone would suffice to execute a child process that is completely on-demand-paged by the parent (the parent would just need to install itself as fault handler for the child's RM session). So classical UNIX semantics could be emulated. But there is another handy twist to it, namely managed dataspaces.
A managed dataspace is a RM session used as dataspace. The RM session interface has a 'dataspace' function, which returns a dataspace capability. This capability can be used to attach the address space described by the RM session as a region to any other RM session. Thereby, it becomes possible to employ different on-demand paging strategies for different parts of the address space. I admit, this sounds a bit esoteric - so let's better look at a practical example, which comes in the form of the iso9660 server:
os/src/server/iso9660
This server offers the content of a CD-ROM as individual ROM sessions. By simply routing ROM session requests to iso9660 instead of core, programs can access files from CD-ROM instead of core's boot modules in a completely transparent manner. A ROM session carries a donation of just a few KB of memory, which is not enough to hold the content of an arbitrarily sized file stored on the CD-ROM. The solution is to implement the ROM service using managed dataspaces. For each client, a managed dataspace is created. Initially this is not backed by physical memory at all. But once the client has attached the dataspace and starts accessing it, page faults come in. The iso9660 handles those page faults by reading the corresponding blocks from the block device into a RAM dataspace (backing store) and attaching the part of the backing store at the fault address to the managed dataspace. If the backing store becomes crowded, the iso9660 makes room by detaching the evicted parts from the managed dataspace. Consequently, the client can open files of any size. The block device access follows the access pattern of the client. For the client, the underlying mechanism remains completely transparent.
Another use case of managed dataspaces is the handling of the thread context area. You may also take a look at the following example:
base/src/test/rm_nested
Does anybody know whether Linux has a similar mechanism (i.e. reserve a chunk of physical memory for a virtual memory region before the first access to this region)? If not, what are the pros and cons of these two strategies?
I am not aware of a Linux mechanism for reserving physical memory. But as Linux is used for real-time applications, I expect that there exists a mechanism. However, I think that there is no equivalent to managed dataspaces on Linux. The paging strategy resides in the kernel.
Best regards Norman