Asynchronous Nested Page Fault Handling

List overview All Threads
Download

newer

older

error to build Genode for Linux

Need help for Genode + bare-metal...

Daniel Waddington

12 Sep 2011 12 Sep '11

6:25 p.m.

Hi,

I am wondering why Genode uses asynchronous signals to call custom nested page fault-handlers. Can someone explain why? It would seem more sensible to use synchronous IPC for this purpose.

Thanks Daniel

Attachments:

attachment.html (text/html — 850 bytes)

Show replies by date

Norman Feske

12 Sep 12 Sep

10:39 p.m.

Hi Daniel,

...

I am wondering why Genode uses asynchronous signals to call custom nested page fault-handlers. Can someone explain why? It would seem more sensible to use synchronous IPC for this purpose.

the answer to this question is not just a matter of the communication mechanism used but a matter of trust relationships. If core employed a synchronous interface for reflecting page faults to the user land, it would make itself depend on the proper operation of each dataspace manager involved. I.e., if core called a dataspace manager via synchronous IPC (let's say, invoking a RPC function 'resolve_fault'), it can't be sure that the call will ever return.

In contrast, by using asynchronous notifications, core hands out the information that something interesting happened as a fire-and-forget message to the dataspace manager. This way, core does not make itself dependent on any higher-level user-space component. The dataspace manager can respond to this signal by querying page fault information from core. This query can be done via synchronous IPC because the dataspace manager trusts core anyway.

I should mention that there exists an alternative design for implementing nested dataspaces - using synchronous IPC. This concept is mostly referred to as "local region mapper". In this approach, the pager (called region mapper) of a process resides in the same address space as the process (the pager thread itself is paged by someone else). If any thread of the process (other than the pager) produces a page fault, a page-fault message is delivered to the local region mapper. The region mapper can then request flexpage mappings directly from a dataspace manager and receives map items as response via synchronous IPC.

Even though the "local region manager" concept can be implemented on Genode (we did some prototyping in the past), we discarded the concept for the following reasons:

* The region manager must possess a capability to directly communicate to the dataspace manager. On Genode, managed dataspaces are entirely transparent to the process using them.

* The dataspace manager must possess a direct communication right to the user of its dataspaces (to send mappings via IPC). In contrast, on Genode, a dataspace manager does not need direct communication rights to anyone using its dataspaces. It interacts with core only.

* The local region mapper must be paged - so a special case for handling this thread is always needed.

* By sending flexpage-mappings via synchronous IPC, memory mappings would get established without core knowing about them. As an ultimate consequence, the system would depend on an in-kernel mapping database for revoking these mappings later on (e.g., for regaining physical resources during the destruction of a process). I regard the in- kernel mapping database as the most unfortunate part of most L4 kernel designs. Genode does not to depend on such a kernel feature.

* (Somehow related to the previous point) The local region mapper concept requires an IPC mechanism that supports the communication of memory mappings in addition to normal message payloads.

That said, the current state of Genode's managed dataspace concept is not carved in stone. It is primarily designed for implementing use cases that require a few and fairly large managed dataspaces. You have to keep in mind that each managed dataspace is actually a RM session that must be paid for. If we see the concept of managed dataspaces picked up for implementing many small dataspaces, we should seek for a more lightweight mechanism.

Coming back to your original question: What is your actual concern about using asynchronous notifications for implementing managed dataspaces? Have you come up with a clever idea to implement a synchronous protocol instead? I would love exploring this.

Cheers Norman

-- Dr.-Ing. Norman Feske Genode Labs http://www.genode-labs.com · http://genode.org Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth

Daniel Waddington

11:44 p.m.

Hello Norman, Thank you for your very thorough answer. I had missed the issue of trust relationship and thus your approach makes sense in this context - I do wonder if a timeout on synchronous IPC would be nicer (if you have a slow page fault handling you might be doing a lot of needless exception handling). I was looking at this because we are using the page-fault handlers as a point of synchronization for serializing access to shared data and thus I needed synchronous semantics. I can tell you more off-line if you want.

Thanks, Daniel

On 09/12/2011 01:39 PM, Norman Feske wrote:

...

Hi Daniel,

...
I am wondering why Genode uses asynchronous signals to call custom nested page fault-handlers. Can someone explain why? It would seem more sensible to use synchronous IPC for this purpose.

the answer to this question is not just a matter of the communication mechanism used but a matter of trust relationships. If core employed a synchronous interface for reflecting page faults to the user land, it would make itself depend on the proper operation of each dataspace manager involved. I.e., if core called a dataspace manager via synchronous IPC (let's say, invoking a RPC function 'resolve_fault'), it can't be sure that the call will ever return.

In contrast, by using asynchronous notifications, core hands out the information that something interesting happened as a fire-and-forget message to the dataspace manager. This way, core does not make itself dependent on any higher-level user-space component. The dataspace manager can respond to this signal by querying page fault information from core. This query can be done via synchronous IPC because the dataspace manager trusts core anyway.

I should mention that there exists an alternative design for implementing nested dataspaces - using synchronous IPC. This concept is mostly referred to as "local region mapper". In this approach, the pager (called region mapper) of a process resides in the same address space as the process (the pager thread itself is paged by someone else). If any thread of the process (other than the pager) produces a page fault, a page-fault message is delivered to the local region mapper. The region mapper can then request flexpage mappings directly from a dataspace manager and receives map items as response via synchronous IPC.

Even though the "local region manager" concept can be implemented on Genode (we did some prototyping in the past), we discarded the concept for the following reasons:

The region manager must possess a capability to directly communicate to the dataspace manager. On Genode, managed dataspaces are entirely transparent to the process using them.

The dataspace manager must possess a direct communication right to the user of its dataspaces (to send mappings via IPC). In contrast, on Genode, a dataspace manager does not need direct communication rights to anyone using its dataspaces. It interacts with core only.

The local region mapper must be paged - so a special case for handling this thread is always needed.

By sending flexpage-mappings via synchronous IPC, memory mappings would get established without core knowing about them. As an ultimate consequence, the system would depend on an in-kernel mapping database for revoking these mappings later on (e.g., for regaining physical resources during the destruction of a process). I regard the in- kernel mapping database as the most unfortunate part of most L4 kernel designs. Genode does not to depend on such a kernel feature.

(Somehow related to the previous point) The local region mapper concept requires an IPC mechanism that supports the communication of memory mappings in addition to normal message payloads.

That said, the current state of Genode's managed dataspace concept is not carved in stone. It is primarily designed for implementing use cases that require a few and fairly large managed dataspaces. You have to keep in mind that each managed dataspace is actually a RM session that must be paid for. If we see the concept of managed dataspaces picked up for implementing many small dataspaces, we should seek for a more lightweight mechanism.

Coming back to your original question: What is your actual concern about using asynchronous notifications for implementing managed dataspaces? Have you come up with a clever idea to implement a synchronous protocol instead? I would love exploring this.

Cheers Norman

Norman Feske

13 Sep 13 Sep

11:36 p.m.

Hi Daniel,

...

I do wonder if a timeout on synchronous IPC would be nicer (if you have a slow page fault handling you might be doing a lot of needless exception handling). I was looking at this because we are using the

At the first glance, IPC wall-clock timeouts look like a lovely thing. I have succumbed to this idea in the past and paid for it with a lot of pain and frustration. The problem is the choice of the timeout value. There are two options: too short and too long. Mostly both options are combined.

If a timeout is too short, the IPC message may get dropped, which is fatal in the many cases. For example, when I was working on the DOpE GUI server, I used to deliver input events to clients via synchronous messages and timeouts. A timeout of 5 milliseconds looked like a reasonable value. However, in the presence of several busy threads in the ready queue - each consuming its whole time slice of 10 milliseconds, situations arised where the receiver thread hasn't had the slightest chance to pick up the IPC message. Consequently the input event would get dropped. Increasing the timeout pushes the problem a bit away - until it appears again in another situation of high load. When it appears, it is extremely difficult to debug because the problem is almost not reproduceable.

On the other hand, if a timeout is too long, the point of using a timeout in the first place vanishes. E.g. for the GUI example, by using a timeout of 50 milliseconds for delivering user input events, the operation of the server could be stalled for 50 milliseconds in the worst case - for each client. So if 10 clients misbehave, the GUI would freeze for half a second, which is unacceptable. Large timeouts make the system prone to jitter and harm the deterministic behaviour of the system.

The situation for reflecting page faults to the user land is very similar to the GUI example. Timeouts in the presence of multiple dataspace managers would accumulate. On the other hand, if a message would get dropped, a page fault would remain unresolved forever.

There is only one sensible use case of IPC timeouts, which is using IPC as time source on platforms where only one hardware timer exists. Because the kernel needs a timer for preempting user threads, the kernel allocates the timer for itself. Hence, there would be no timer left for the user land. In this scenario, IPC timeouts become handy. Our timer drivers on Fiasco.OC, Pistachio, and L4/Fiasco make use of this feature. In my opinion, all other uses of IPC timeouts are bugs. .-)

Cheers Norman

5094

Age (days ago)

5095

Last active (days ago)

users@lists.genode.org

3 comments

2 participants

tags (0)

participants (2)

Daniel Waddington
Norman Feske