Hi Daniel,
I do wonder if a timeout on synchronous IPC would be nicer (if you have a slow page fault handling you might be doing a lot of needless exception handling). I was looking at this because we are using the
At the first glance, IPC wall-clock timeouts look like a lovely thing. I have succumbed to this idea in the past and paid for it with a lot of pain and frustration. The problem is the choice of the timeout value. There are two options: too short and too long. Mostly both options are combined.
If a timeout is too short, the IPC message may get dropped, which is fatal in the many cases. For example, when I was working on the DOpE GUI server, I used to deliver input events to clients via synchronous messages and timeouts. A timeout of 5 milliseconds looked like a reasonable value. However, in the presence of several busy threads in the ready queue - each consuming its whole time slice of 10 milliseconds, situations arised where the receiver thread hasn't had the slightest chance to pick up the IPC message. Consequently the input event would get dropped. Increasing the timeout pushes the problem a bit away - until it appears again in another situation of high load. When it appears, it is extremely difficult to debug because the problem is almost not reproduceable.
On the other hand, if a timeout is too long, the point of using a timeout in the first place vanishes. E.g. for the GUI example, by using a timeout of 50 milliseconds for delivering user input events, the operation of the server could be stalled for 50 milliseconds in the worst case - for each client. So if 10 clients misbehave, the GUI would freeze for half a second, which is unacceptable. Large timeouts make the system prone to jitter and harm the deterministic behaviour of the system.
The situation for reflecting page faults to the user land is very similar to the GUI example. Timeouts in the presence of multiple dataspace managers would accumulate. On the other hand, if a message would get dropped, a page fault would remain unresolved forever.
There is only one sensible use case of IPC timeouts, which is using IPC as time source on platforms where only one hardware timer exists. Because the kernel needs a timer for preempting user threads, the kernel allocates the timer for itself. Hence, there would be no timer left for the user land. In this scenario, IPC timeouts become handy. Our timer drivers on Fiasco.OC, Pistachio, and L4/Fiasco make use of this feature. In my opinion, all other uses of IPC timeouts are bugs. .-)
Cheers Norman