Norman:
Thank you for the very thoughtful explanation. I found no useful examples of cancel_blocking in core Genode, so perhaps it is safe enough to remove, at least from the documentation; or, emit a warning in the implementations.
The final sentence of Section 4.7.6 "Enslaving services" is the one that suggested my experiment (17.05 edition). Possibly I took this out of context. Overall the Foundations book is of excellent quality, and sets a standard for systems of this type.
Arranging component relationships so that client and server correspond to a natural asymmetry of trustworthiness is sometimes straightforward, but sometimes ambiguous. E.g. should one trust calls to a log service? What if the log service gets upgraded to log to a network host that falls under control of an attacker? The attacker exploits a vulnerability and owns the logger; some critical component then halts the next time it issues a logging call. Yes, you can e.g. redesign the logger as a client--I've done this, but it adds to the complexity of other components.
In some cases, RPC might not be the most natural communications solution. Is asynchronous message-passing (using only signals and shared memory) feasible in Genode? Maybe something similar to "vchan" in Xen/Qubes. Perhaps this exists?
// Steve
On 10/17/2017 04:32 AM, Norman Feske wrote:
Hi Steve,
On 12.10.2017 00:26, Steven Harp wrote:
The Genode book suggests that an RPC caller can protect itself from blocking in a stalled server by creating a watchdog thread to monitor the process of the call, and cancel it if it takes too long.
Is there a robust/canonical example of using cancel_blocking in this way?
I am afraid that the book misled you towards an outdated direction. The cancel-blocking mechanism was introduced very early at a time when we routinely designed inter-component interfaces that were blocking at the server side. At that time, L4 kernels did not support any means of asynchronous notifications, thereby luring us into this direction. Later, we realized this mistake and successively redesigned the interfaces [1] to use a combination of synchronous RPCs that immediately return and asynchronous notifications for blocking at the client side. We announced this transition in May last year [2] and finished it in May this year.
[1] https://genode.org/documentation/release-notes/13.02#Timer interface turned into asynchronous mode of operation [2] https://genode.org/documentation/release-notes/16.05#The_great_API_renovatio...
For modern components, the cancel-blocking mechanism is no longer used. We still keep it around to uphold compatibility but I hope to eventually remove it from the API in the not-too-distant future.
My experiment with this (Genode 17.08, x86_32) seems to work as expected--but, only with the OKL4 kernel!? With nova, hw, and seL4, the cancel_blocking() method executes but seemingly to no effect: the thread continues to wait on the (contrived) very slow RPC call, which eventually completes.
Suggestions?
When a client calls a server, it ultimately yields the flow of control to the server until the server replies. Because a misbehaving server may never reply, e.g., because of a bug, the client could get stuck at that point. There is no counter measure for this situation. We found that potential counter measures like IPC timeouts or the cancel-blocking mechanism that are intuitively tempting are bug prone and lead to indeterministic system behavior. A client unconditionally expects that the server replies to an RPC request. From a client's perspective, a server called via RPC functions is similar to a regular third-party library. When calling a library function, one can never be sure that the function will eventually return. It could get stuck in the library. Therefore, we devise the best practice to implement complex (bug-prone) software as mere clients, not servers. Please consider Section 3.2.4. "Client-server relationship" of the book for a succinct characterization of the client and server roles within Genode.
The canonical example of this best practice is the window manager, which is a composition of the low-complexity 'wm' component (that acts as a server) and the potentially high-complexity (and more bug prone) layouter and window decorator components. The latter two components are mere clients of the 'wm' server. Another good example is the way how the (trusted) report_rom server decouples the producers and consumers of state information. Both the producer ('Report' session client) and consumer ('ROM' session client) are clients of the report_rom server. They both trust the report_rom server but they don't need to trust each other, nor does the report_rom server need to trust any of them.
Note that throughout Genode, there are still several places where we don't fully adhere to this practice yet. I.e., NIC drivers (like the highly complex wifi_drv) still act as servers. But we will ultimately change this in a way that NIC drivers will become clients of the low complexity nic_router component.
When following this route, there is no need for the cancel-blocking mechanism. Your observation that the cancel-blocking mechanism works for RPCs on OKL4 is just an artifact from the past.
Sorry that the book guided you in the wrong direction. Could you please point me to the particular part so that I can revise it?
Cheers Norman