thread cancel_blocking usage, kernels

Tue Oct 17 17:47:19 CEST 2017

Norman:

Thank you for the very thoughtful explanation. I found no useful examples
of cancel_blocking in core Genode, so perhaps it is safe enough to remove,
at least from the documentation; or, emit a warning in the implementations.

The final sentence of Section 4.7.6 "Enslaving services" is the one
that suggested my experiment (17.05 edition).  Possibly I took this
out of context.  Overall the Foundations book is of excellent quality,
and sets a standard for systems of this type.

Arranging component relationships so that client and server correspond to
a natural asymmetry of trustworthiness is sometimes straightforward, but
sometimes ambiguous.  E.g. should one trust calls to a log service? What
if the log service gets upgraded to log to a network host that falls under
control of an attacker? The attacker exploits a vulnerability and owns 
the logger; some critical component then halts the next time it issues a
logging call. Yes, you can e.g. redesign the logger as a client--I've done
this, but it adds to the complexity of other components.

In some cases, RPC might not be the most natural communications solution.
Is asynchronous message-passing (using only signals and shared memory)
feasible in Genode?  Maybe something similar to "vchan" in Xen/Qubes.
Perhaps this exists?

// Steve

On 10/17/2017 04:32 AM, Norman Feske wrote:
> Hi Steve,
> 
> On 12.10.2017 00:26, Steven Harp wrote:
>> The Genode book suggests that an RPC caller can protect itself from blocking
>> in a stalled server by creating a watchdog thread to monitor the process of
>> the call, and cancel it if it takes too long.
>>
>> Is there a robust/canonical example of using cancel_blocking in this way?
> 
> I am afraid that the book misled you towards an outdated direction. The
> cancel-blocking mechanism was introduced very early at a time when we
> routinely designed inter-component interfaces that were blocking at the
> server side. At that time, L4 kernels did not support any means of
> asynchronous notifications, thereby luring us into this direction.
> Later, we realized this mistake and successively redesigned the
> interfaces [1] to use a combination of synchronous RPCs that immediately
> return and asynchronous notifications for blocking at the client side.
> We announced this transition in May last year [2] and finished it in May
> this year.
> 
> [1] https://genode.org/documentation/release-notes/13.02#Timer interface
> turned into asynchronous mode of operation
> [2]
> https://genode.org/documentation/release-notes/16.05#The_great_API_renovation
> 
> For modern components, the cancel-blocking mechanism is no longer used.
> We still keep it around to uphold compatibility but I hope to eventually
> remove it from the API in the not-too-distant future.
> 
>> My experiment with this (Genode 17.08, x86_32) seems to work as expected--but,
>> only with the OKL4 kernel!?    With nova, hw, and seL4, the cancel_blocking()
>> method executes but seemingly to no effect: the thread continues to wait on
>> the (contrived) very slow RPC call, which eventually completes.
>>
>> Suggestions?
> 
> When a client calls a server, it ultimately yields the flow of control
> to the server until the server replies. Because a misbehaving server may
> never reply, e.g., because of a bug, the client could get stuck at that
> point. There is no counter measure for this situation. We found that
> potential counter measures like IPC timeouts or the cancel-blocking
> mechanism that are intuitively tempting are bug prone and lead to
> indeterministic system behavior.  A client unconditionally expects that
> the server replies to an RPC request. From a client's perspective, a
> server called via RPC functions is similar to a regular third-party
> library. When calling a library function, one can never be sure that the
> function will eventually return. It could get stuck in the library.
> Therefore, we devise the best practice to implement complex (bug-prone)
> software as mere clients, not servers. Please consider Section 3.2.4.
> "Client-server relationship" of the book for a succinct characterization
> of the client and server roles within Genode.
> 
> The canonical example of this best practice is the window manager, which
> is a composition of the low-complexity 'wm' component (that acts as a
> server) and the potentially high-complexity (and more bug prone)
> layouter and window decorator components. The latter two components are
> mere clients of the 'wm' server. Another good example is the way how the
> (trusted) report_rom server decouples the producers and consumers of
> state information. Both the producer ('Report' session client) and
> consumer ('ROM' session client) are clients of the report_rom server.
> They both trust the report_rom server but they don't need to trust each
> other, nor does the report_rom server need to trust any of them.
> 
> Note that throughout Genode, there are still several places where we
> don't fully adhere to this practice yet. I.e., NIC drivers (like the
> highly complex wifi_drv) still act as servers. But we will ultimately
> change this in a way that NIC drivers will become clients of the low
> complexity nic_router component.
> 
> When following this route, there is no need for the cancel-blocking
> mechanism. Your observation that the cancel-blocking mechanism works for
> RPCs on OKL4 is just an artifact from the past.
> 
> Sorry that the book guided you in the wrong direction. Could you please
> point me to the particular part so that I can revise it?
> 
> Cheers
> Norman
>