thread cancel_blocking usage, kernels

Norman Feske norman.feske at ...1...
Tue Oct 17 11:32:18 CEST 2017


Hi Steve,

On 12.10.2017 00:26, Steven Harp wrote:
> The Genode book suggests that an RPC caller can protect itself from blocking
> in a stalled server by creating a watchdog thread to monitor the process of
> the call, and cancel it if it takes too long.
> 
> Is there a robust/canonical example of using cancel_blocking in this way?

I am afraid that the book misled you towards an outdated direction. The
cancel-blocking mechanism was introduced very early at a time when we
routinely designed inter-component interfaces that were blocking at the
server side. At that time, L4 kernels did not support any means of
asynchronous notifications, thereby luring us into this direction.
Later, we realized this mistake and successively redesigned the
interfaces [1] to use a combination of synchronous RPCs that immediately
return and asynchronous notifications for blocking at the client side.
We announced this transition in May last year [2] and finished it in May
this year.

[1] https://genode.org/documentation/release-notes/13.02#Timer interface
turned into asynchronous mode of operation
[2]
https://genode.org/documentation/release-notes/16.05#The_great_API_renovation

For modern components, the cancel-blocking mechanism is no longer used.
We still keep it around to uphold compatibility but I hope to eventually
remove it from the API in the not-too-distant future.

> My experiment with this (Genode 17.08, x86_32) seems to work as expected--but,
> only with the OKL4 kernel!?    With nova, hw, and seL4, the cancel_blocking()
> method executes but seemingly to no effect: the thread continues to wait on
> the (contrived) very slow RPC call, which eventually completes.
> 
> Suggestions?

When a client calls a server, it ultimately yields the flow of control
to the server until the server replies. Because a misbehaving server may
never reply, e.g., because of a bug, the client could get stuck at that
point. There is no counter measure for this situation. We found that
potential counter measures like IPC timeouts or the cancel-blocking
mechanism that are intuitively tempting are bug prone and lead to
indeterministic system behavior.  A client unconditionally expects that
the server replies to an RPC request. From a client's perspective, a
server called via RPC functions is similar to a regular third-party
library. When calling a library function, one can never be sure that the
function will eventually return. It could get stuck in the library.
Therefore, we devise the best practice to implement complex (bug-prone)
software as mere clients, not servers. Please consider Section 3.2.4.
"Client-server relationship" of the book for a succinct characterization
of the client and server roles within Genode.

The canonical example of this best practice is the window manager, which
is a composition of the low-complexity 'wm' component (that acts as a
server) and the potentially high-complexity (and more bug prone)
layouter and window decorator components. The latter two components are
mere clients of the 'wm' server. Another good example is the way how the
(trusted) report_rom server decouples the producers and consumers of
state information. Both the producer ('Report' session client) and
consumer ('ROM' session client) are clients of the report_rom server.
They both trust the report_rom server but they don't need to trust each
other, nor does the report_rom server need to trust any of them.

Note that throughout Genode, there are still several places where we
don't fully adhere to this practice yet. I.e., NIC drivers (like the
highly complex wifi_drv) still act as servers. But we will ultimately
change this in a way that NIC drivers will become clients of the low
complexity nic_router component.

When following this route, there is no need for the cancel-blocking
mechanism. Your observation that the cancel-blocking mechanism works for
RPCs on OKL4 is just an artifact from the past.

Sorry that the book guided you in the wrong direction. Could you please
point me to the particular part so that I can revise it?

Cheers
Norman

-- 
Dr.-Ing. Norman Feske
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth




More information about the users mailing list