Re: vfs fifo pipe and file system session write

9 Dec 2020

      Hi Norman
Thanks for your explanation.
...
...
We think the only proper fix would be a change in the file system 
session to allow the write operation to report back the number of
bytes written. Is there a specific reason to keep the write
'fire-and-forget', other than simplicity?
I'm afraid that your assessment is not correct. There is indeed a
"specific reason" behind the design. Let me explain.
Intuitively, letting the write operation return the number of written
bytes would be a no-brainer. This is what is suggested by the POSIX
write function after all. In practice, however, this approach implies
two hard problems:

To know the number of bytes written, the caller has to wait for
the acknowledgement of the write operation. This synchronization
point adds the end-to-end latency of the write operation to each
individual write request. This is particularly bad when issuing
sequences of write operations. Effectively, the need for waiting
for the acknowledgement removes the opportunity of batching file-
system requests. In a component-based system where we need to
consider a chain of file-system servers, this problem is amplified.
In contrast, our design facilitates the hiding of write latency
using the principal approach of pipelining. The contract of the
write operation is simple: When the client was able to successfully
enqueue the write request into the file-system session's packet
stream, the operation is successful. The client can immediately
resume execution regardless of the latency of the write operation.

Very true, but this applies also to the read operation where such
behavior cannot be avoided.
...

Assuming that we reflected the number of written bytes to the
client, what would a client do with this information? There are
two likely answers.
(a) The client ignores this information. This is what happens
    in the real world for most users of the POSIX write call.
    Partial writes are generally not anticipated by application
    software because they don't happen on commodity systems.
    In this case, data would go lost.
Anecdotally, we have repeatedly encountered this problem
with ported software using previous versions of our VFS/libc,
which happened to reflect partial writes to the applications
at that time.

True, but Linux will not perform the non-blocking write to a full pipe
at all and set errno to EAGAIN.
...
(b) The client would respond by issuing another write operation
       with the remaining content. In a scenario where a write
       operation can only be done partially (e.g., pipe is full like
       in your example), this approach would ultimately result in a
       busy loop.
There are good reasons to implement an application with non-blocking
write and use select. The most common is not complicating the code with
thread synchronization.
...
In short, our design tries the leverage async I/O for hiding the latency
of write operations, and it yields the desired behavior of blocking
instead of busylooping when no progress can be made.
As far as I understand, this leads to a blocking write when all buffers
are full even when the application intended a non-blocking write. I'm
not convinced this is a good solution for best compatibility for porting
posix/libc applications.
Also, when the write fails for another reason, such as a lack of disk
space the application can't detect that problem and data may be lost.
...
...
We are still trying to introduce a named pipe for communication between
libc components and pure genode components. Our vfs fifo pipe now works
with multiple threads in the same process, but only when it's hosted in
the vfs of that same component. As soon as we host it in separate vfs
component it stalls randomly.
My interpretation of the scenario:

In general, multiple operations can be enqueued to a single file-
system session. The VFS server processes the submitted operations
strictly in order. The file-system session is a serialization point.

The VFS pipe plugin introduces a dependency of write operations from
read operations. If the pipe is full, a write operation has to stall
until a reader has consumed data from the pipe.

The pipe buffer is bounded.

Your client uses a single file system session to submit both read and
write requests to the VFS server.

What happens:
The pipe buffer is saturated by previous write operations.
The client issues a write operation that exceeds the remaining capacity
of the pipe buffer. Consequently, the write request stays in the packet
stream to be picked up by the VFS server the next time data can be
consumed. Each time the VFS server observes I/O, it tries to resume the
write operation. This is done piece by piece until the entire request is
completely processed. In your case, the write would stall until the pipe
buffer has gained some new room.
As file operations are processed strictly in order, the partially
processed write operation clogs up the file-system sessions packet
stream. This is because the file-system session is a serialization point.
The client submits a read operation to the file-system session. Even
though the operation got enqueued, the VFS server never 1ooks at it
because it is still concerned with the not-yet-completed write operation.
A deadlock occurs because the read operation - which is second in the
queue - would be required for the progress of the write operation (first
in the queue).
What can you do about it?
The interlocking of inter-dependent read and write in one data channel
must be avoided. In a multi-component scenario, each the reader and
writer are separate components with each having a distinct file system
session. So this situation does not occur.
For your single-component scenario, you may consider using two
file-system sessions, one for using the reading and one for the writing
end of the pipe. Both sessions would be routed to the same VFS server.
  <vfs>
    ...
    <dir name="reader"> <fs label="pipe"/> </dir>
    <dir name="writer"> <fs label="pipe"/> </dir>
  </vfs>
This way, read and write operations cannot interlock.
Thanks, this solves our problem.
...
...
This leads to a permanent blocking
of the vfs component as no read operation can alleviate the full buffer
of the fifo pipe.
The statement irritates me because the VFS server must never block. Are
you sure that the server is blocking, not merely stalling a single
session? Please connect an unrelated component to the VFS server to see
whether it remains responsive or not. The latter case would be a bug (of
the VFS server, the VFS library, or one of the used plugins).
You are of course correct, the vfs component doesn't block but retries
the same write later with not success. Read from another session works
just fine and lets the write succeed.
Bests
Stefan
-- 
Freundliche Grüsse

Stefan Thöni
Chairman of the Board
Senior Security Architect
+41 79 610 64 95

gapfruit AG
Baarerstrasse 135
6300 Zug
https://gapfruit.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: vfs fifo pipe and file system session write