Asynchronous Nested Page Fault Handling

Tue Sep 13 23:36:43 CEST 2011

Hi Daniel,

> I do wonder if a timeout on synchronous IPC would be nicer (if you have
> a slow page fault handling you might be doing a lot of needless
> exception handling).  I was looking at this because we are using the

At the first glance, IPC wall-clock timeouts look like a lovely thing. I
have succumbed to this idea in the past and paid for it with a lot of
pain and frustration. The problem is the choice of the timeout value.
There are two options: too short and too long. Mostly both options are
combined.

If a timeout is too short, the IPC message may get dropped, which is
fatal in the many cases. For example, when I was working on the DOpE GUI
server, I used to deliver input events to clients via synchronous
messages and timeouts. A timeout of 5 milliseconds looked like a
reasonable value. However, in the presence of several busy threads in
the ready queue - each consuming its whole time slice of 10
milliseconds, situations arised where the receiver thread hasn't had the
slightest chance to pick up the IPC message. Consequently the input
event would get dropped. Increasing the timeout pushes the problem a bit
away - until it appears again in another situation of high load. When it
appears, it is extremely difficult to debug because the problem is
almost not reproduceable.

On the other hand, if a timeout is too long, the point of using a
timeout in the first place vanishes. E.g. for the GUI example, by using
a timeout of 50 milliseconds for delivering user input events, the
operation of the server could be stalled for 50 milliseconds in the
worst case - for each client. So if 10 clients misbehave, the GUI would
freeze for half a second, which is unacceptable. Large timeouts make the
system prone to jitter and harm the deterministic behaviour of the system.

The situation for reflecting page faults to the user land is very
similar to the GUI example. Timeouts in the presence of multiple
dataspace managers would accumulate. On the other hand, if a message
would get dropped, a page fault would remain unresolved forever.

There is only one sensible use case of IPC timeouts, which is using IPC
as time source on platforms where only one hardware timer exists.
Because the kernel needs a timer for preempting user threads, the kernel
allocates the timer for itself. Hence, there would be no timer left for
the user land. In this scenario, IPC timeouts become handy. Our timer
drivers on Fiasco.OC, Pistachio, and L4/Fiasco make use of this feature.
In my opinion, all other uses of IPC timeouts are bugs. .-)

Cheers
Norman

-- 
Dr.-Ing. Norman Feske
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth