Hi,
When a data space is created, physical memory is reserved and cannot be used by other processes until the data space is destructed. I wonder if it would be a possible situation where physical memory may get wasted. For example, an application may first allocate a huge chunk of memory (through malloc, which will eventually lead to the creation of one or more data spaces), but only touch a small portion of it throughout the entire execution. Another example would be to have a large binary where a significant portion of the binary are not actually executed. Thus, reserving physical memory for these instruction when a program is loaded may not be good, especially in a resource-constrained environment.
Does anybody know whether Linux has a similar mechanism (i.e. reserve a chunk of physical memory for a virtual memory region before the first access to this region)? If not, what are the pros and cons of these two strategies?
Thank you.
Best, Chen
Hi Chen,
When a data space is created, physical memory is reserved and cannot be used by other processes until the data space is destructed. I wonder if it would be a possible situation where physical memory may get wasted. For example, an application may first allocate a huge chunk of memory (through malloc, which will eventually lead to the creation of one or more data spaces), but only touch a small portion of it throughout the entire execution. Another example would be to have a large binary where a significant portion of the binary are not actually executed. Thus, reserving physical memory for these instruction when a program is loaded may not be good, especially in a resource-constrained environment.
you've hit a very good point. Genode does not attempt to pretend an unlimited amount of physical resources to applications. The assignment, allocation, and trading of physical resources is always explicit. No physical resource can be allocated twice at the same time. This way, Genode guarantees the availability of a physical resource for an application after a successful allocation. This is not the case for traditional operating systems, which offer unlimited resources to each application and implement clever strategies to uphold this illusion as far as possible. Because the resources are limited, however, this illusion will inevitably break at some point. Quality of service is sacrificed to achieve high utilization.
On Genode, the creation of a RAM dataspace implies the reservation of a physical memory range regardless of how this dataspace is used afterwards. In the worst case, a huge dataspace may be created, which is then only sparsely used. This happens in the malloc implementation of the FreeBSD libc, which uses to issue an anonymous 'mmap' for a large memory area right at initialization time. On UNIX, this is no problem because the backing store for this virtual memory area gets allocated on demand. However, on Genode, the process must have assigned a large RAM quota in order to create the large dataspace without ever using most of it. This is a problem, which can be addressed in two different ways, by avoiding sparsely populated dataspaces or by using managed dataspaces.
For the libc, we "fixed" the problem by replacing the original malloc implementation by a simple wrapper around Genode's 'env()->heap()', which has a more conservative backing-store allocation strategy and avoids the initial allocation of a large chunk of anonymous memory. This solution serves us well for our current use cases.
However, we expect cases, where the approach of "fixing" the application does not work, or where over provisioning of resources is a desired feature. I am thinking about the Noux UNIX emulation environment, which should allow over provisioning of resources to classical UNIX programs - in the line of how a real UNIX kernel works. Here is where managed dataspaces come into play. Each RM session can be on-demand paged by installing a signal handler using the 'rm_session->fault_handler()' function. This way, the creator of the RM session can receive a notification each time a page fault occurs, which is not backed by a current dataspace attachment. It then can request the fault address and attach a dataspace to the fault address. Thereby, the page fault gets resolved and the faulted thread resumes its execution. For an example of how this works, please refer to:
base/src/test/rm_fault
This mechanism alone would suffice to execute a child process that is completely on-demand-paged by the parent (the parent would just need to install itself as fault handler for the child's RM session). So classical UNIX semantics could be emulated. But there is another handy twist to it, namely managed dataspaces.
A managed dataspace is a RM session used as dataspace. The RM session interface has a 'dataspace' function, which returns a dataspace capability. This capability can be used to attach the address space described by the RM session as a region to any other RM session. Thereby, it becomes possible to employ different on-demand paging strategies for different parts of the address space. I admit, this sounds a bit esoteric - so let's better look at a practical example, which comes in the form of the iso9660 server:
os/src/server/iso9660
This server offers the content of a CD-ROM as individual ROM sessions. By simply routing ROM session requests to iso9660 instead of core, programs can access files from CD-ROM instead of core's boot modules in a completely transparent manner. A ROM session carries a donation of just a few KB of memory, which is not enough to hold the content of an arbitrarily sized file stored on the CD-ROM. The solution is to implement the ROM service using managed dataspaces. For each client, a managed dataspace is created. Initially this is not backed by physical memory at all. But once the client has attached the dataspace and starts accessing it, page faults come in. The iso9660 handles those page faults by reading the corresponding blocks from the block device into a RAM dataspace (backing store) and attaching the part of the backing store at the fault address to the managed dataspace. If the backing store becomes crowded, the iso9660 makes room by detaching the evicted parts from the managed dataspace. Consequently, the client can open files of any size. The block device access follows the access pattern of the client. For the client, the underlying mechanism remains completely transparent.
Another use case of managed dataspaces is the handling of the thread context area. You may also take a look at the following example:
base/src/test/rm_nested
Does anybody know whether Linux has a similar mechanism (i.e. reserve a chunk of physical memory for a virtual memory region before the first access to this region)? If not, what are the pros and cons of these two strategies?
I am not aware of a Linux mechanism for reserving physical memory. But as Linux is used for real-time applications, I expect that there exists a mechanism. However, I think that there is no equivalent to managed dataspaces on Linux. The paging strategy resides in the kernel.
Best regards Norman
On 06/10/2011 03:48 PM, Norman Feske wrote:
Does anybody know whether Linux has a similar mechanism (i.e. reserve a chunk of physical memory for a virtual memory region before the first access to this region)? If not, what are the pros and cons of these two strategies?
I am not aware of a Linux mechanism for reserving physical memory. But as Linux is used for real-time applications, I expect that there exists a mechanism.
I think mmap()+mlock() do exactly that.
Regards, Julian
Thanks, Julian. Good to know.
Best, Chen
-----Original Message----- From: Julian Stecklina [mailto:js@...14...] Sent: Friday, June 10, 2011 8:23 AM To: genode-main@lists.sourceforge.net Subject: Re: Data space creation and physical memory allocation strategy.
On 06/10/2011 03:48 PM, Norman Feske wrote:
Does anybody know whether Linux has a similar mechanism (i.e. reserve
a chunk of physical memory for a virtual memory region before the first access to this region)? If not, what are the pros and cons of these two strategies?
I am not aware of a Linux mechanism for reserving physical memory. But
as Linux is used for real-time applications, I expect that there exists a mechanism.
I think mmap()+mlock() do exactly that.
Regards, Julian
Nice. I think managed dataspace combined with fault handler mechanism is very powerful. For example, given a small portion of physical memory, we still can write an application that creates an illusion of unlimited resources for each client. To do that, basically we need to allocate a "real" dataspace (i.e. get physical memory from backing store) and do paging upon a page fault. When the application runs out of its RAM, we can do a swap (a file system is probably needed for swapping), which is what Linux is doing.
Now I understand that one of the advantages of reserving physical memory is that no matter what kind of memory allocation scheme one application is using, other applications are never affected.
BTW, it looks like the current heap implementation did not use managed data space, and therefore, every malloc still leads to a physical memory allocation, right?
Best, Chen
-----Original Message----- From: Norman Feske [mailto:norman.feske@...1...] Sent: Friday, June 10, 2011 6:49 AM To: genode-main@lists.sourceforge.net Subject: Re: Data space creation and physical memory allocation strategy.
Hi Chen,
When a data space is created, physical memory is reserved and cannot be used by other processes until the data space is destructed. I wonder if it would be a possible situation where physical memory may get wasted. For example, an application may first allocate a huge chunk of memory (through malloc, which will eventually lead to the creation of one or more data spaces), but only touch a small portion of it throughout the entire execution. Another example would be to have a large binary where a significant portion of the binary are not actually executed. Thus, reserving physical memory for these instruction when a program is loaded may not be good, especially in a resource-constrained environment.
you've hit a very good point. Genode does not attempt to pretend an unlimited amount of physical resources to applications. The assignment, allocation, and trading of physical resources is always explicit. No physical resource can be allocated twice at the same time. This way, Genode guarantees the availability of a physical resource for an application after a successful allocation. This is not the case for traditional operating systems, which offer unlimited resources to each application and implement clever strategies to uphold this illusion as far as possible. Because the resources are limited, however, this illusion will inevitably break at some point. Quality of service is sacrificed to achieve high utilization.
On Genode, the creation of a RAM dataspace implies the reservation of a physical memory range regardless of how this dataspace is used afterwards. In the worst case, a huge dataspace may be created, which is then only sparsely used. This happens in the malloc implementation of the FreeBSD libc, which uses to issue an anonymous 'mmap' for a large memory area right at initialization time. On UNIX, this is no problem because the backing store for this virtual memory area gets allocated on demand. However, on Genode, the process must have assigned a large RAM quota in order to create the large dataspace without ever using most of it. This is a problem, which can be addressed in two different ways, by avoiding sparsely populated dataspaces or by using managed dataspaces.
For the libc, we "fixed" the problem by replacing the original malloc implementation by a simple wrapper around Genode's 'env()->heap()', which has a more conservative backing-store allocation strategy and avoids the initial allocation of a large chunk of anonymous memory. This solution serves us well for our current use cases.
However, we expect cases, where the approach of "fixing" the application does not work, or where over provisioning of resources is a desired feature. I am thinking about the Noux UNIX emulation environment, which should allow over provisioning of resources to classical UNIX programs - in the line of how a real UNIX kernel works. Here is where managed dataspaces come into play. Each RM session can be on-demand paged by installing a signal handler using the 'rm_session->fault_handler()' function. This way, the creator of the RM session can receive a notification each time a page fault occurs, which is not backed by a current dataspace attachment. It then can request the fault address and attach a dataspace to the fault address. Thereby, the page fault gets resolved and the faulted thread resumes its execution. For an example of how this works, please refer to:
base/src/test/rm_fault
This mechanism alone would suffice to execute a child process that is completely on-demand-paged by the parent (the parent would just need to install itself as fault handler for the child's RM session). So classical UNIX semantics could be emulated. But there is another handy twist to it, namely managed dataspaces.
A managed dataspace is a RM session used as dataspace. The RM session interface has a 'dataspace' function, which returns a dataspace capability. This capability can be used to attach the address space described by the RM session as a region to any other RM session. Thereby, it becomes possible to employ different on-demand paging strategies for different parts of the address space. I admit, this sounds a bit esoteric - so let's better look at a practical example, which comes in the form of the iso9660 server:
os/src/server/iso9660
This server offers the content of a CD-ROM as individual ROM sessions. By simply routing ROM session requests to iso9660 instead of core, programs can access files from CD-ROM instead of core's boot modules in a completely transparent manner. A ROM session carries a donation of just a few KB of memory, which is not enough to hold the content of an arbitrarily sized file stored on the CD-ROM. The solution is to implement the ROM service using managed dataspaces. For each client, a managed dataspace is created. Initially this is not backed by physical memory at all. But once the client has attached the dataspace and starts accessing it, page faults come in. The iso9660 handles those page faults by reading the corresponding blocks from the block device into a RAM dataspace (backing store) and attaching the part of the backing store at the fault address to the managed dataspace. If the backing store becomes crowded, the iso9660 makes room by detaching the evicted parts from the managed dataspace. Consequently, the client can open files of any size. The block device access follows the access pattern of the client. For the client, the underlying mechanism remains completely transparent.
Another use case of managed dataspaces is the handling of the thread context area. You may also take a look at the following example:
base/src/test/rm_nested
Does anybody know whether Linux has a similar mechanism (i.e. reserve a chunk of physical memory for a virtual memory region before the first access to this region)? If not, what are the pros and cons of these two strategies?
I am not aware of a Linux mechanism for reserving physical memory. But as Linux is used for real-time applications, I expect that there exists a mechanism. However, I think that there is no equivalent to managed dataspaces on Linux. The paging strategy resides in the kernel.
Best regards Norman
Hello Chen,
resources for each client. To do that, basically we need to allocate a "real" dataspace (i.e. get physical memory from backing store) and do paging upon a page fault. When the application runs out of its RAM, we can do a swap (a file system is probably needed for swapping), which is what Linux is doing.
exactly. But the possibilities go even further. For example, on a NUMA system, a special memory manager could provide a RAM service that migrates dataspaces transparently between local and non-local memory. Another use case would be a memory manager with support for large non-contiguous memory areas.
That said, the concept of managed dataspaces is not time tested yet. We still need a profound understanding of its use cases and possibly improve it. Right now, there is one important thing to keep in mind: each managed dataspace is a separate RM session. Therefore, a managed dataspace is not exactly cheap. The creation will traverse the process tree (in contrast to the allocation of RAM dataspaces) and each RM session must be paid for in terms of a quota donation. Hence, a managed dataspace makes sense for large memory objects but not as a means to manage containers of just a few memory pages.
As another limitation, this concept is not functional on 'base-linux' because Linux does not allow the manipulation of remote address spaces (at least, we do not know how to do it efficiently).
BTW, it looks like the current heap implementation did not use managed data space, and therefore, every malloc still leads to a physical memory allocation, right?
A process may never know whether its 'env()->ram_session()' (which is used as backing store by 'env()->heap()') refers to core's RAM service or not. The RAM session of the process could have been routed to another implementation (e.g., a swapping memory manager) of the RAM-session interface. For all current Genode scenarios, your observation is correct - each process indeed uses core's RAM service and thereby physical memory. But if we had an alternative RAM service implementation, we could make a process use this service by simply routing its RAM session to the alternative RAM service rather than the parent. No code modifications required.
Best regards Norman
Thanks Norman. I think I have a better understanding now. :)
-----Original Message----- From: Norman Feske [mailto:norman.feske@...1...] Sent: Monday, June 13, 2011 10:51 AM To: genode-main@lists.sourceforge.net Subject: Re: Data space creation and physical memory allocation strategy.
Hello Chen,
resources for each client. To do that, basically we need to allocate a "real" dataspace (i.e. get physical memory from backing store) and do
paging
upon a page fault. When the application runs out of its RAM, we can do a swap (a file system is probably needed for swapping), which is what Linux
is
doing.
exactly. But the possibilities go even further. For example, on a NUMA system, a special memory manager could provide a RAM service that migrates dataspaces transparently between local and non-local memory. Another use case would be a memory manager with support for large non-contiguous memory areas.
That said, the concept of managed dataspaces is not time tested yet. We still need a profound understanding of its use cases and possibly improve it. Right now, there is one important thing to keep in mind: each managed dataspace is a separate RM session. Therefore, a managed dataspace is not exactly cheap. The creation will traverse the process tree (in contrast to the allocation of RAM dataspaces) and each RM session must be paid for in terms of a quota donation. Hence, a managed dataspace makes sense for large memory objects but not as a means to manage containers of just a few memory pages.
As another limitation, this concept is not functional on 'base-linux' because Linux does not allow the manipulation of remote address spaces (at least, we do not know how to do it efficiently).
BTW, it looks like the current heap implementation did not use managed
data
space, and therefore, every malloc still leads to a physical memory allocation, right?
A process may never know whether its 'env()->ram_session()' (which is used as backing store by 'env()->heap()') refers to core's RAM service or not. The RAM session of the process could have been routed to another implementation (e.g., a swapping memory manager) of the RAM-session interface. For all current Genode scenarios, your observation is correct - each process indeed uses core's RAM service and thereby physical memory. But if we had an alternative RAM service implementation, we could make a process use this service by simply routing its RAM session to the alternative RAM service rather than the parent. No code modifications required.
Best regards Norman