vfs fifo pipe and file system session write

Norman Feske norman.feske at genode-labs.com
Wed Dec 9 13:51:58 CET 2020

Hi Stefan,

> We think the only proper fix would be a change in the file system 
> session to allow the write operation to report back the number of
> bytes written. Is there a specific reason to keep the write
> 'fire-and-forget', other than simplicity?

I'm afraid that your assessment is not correct. There is indeed a
"specific reason" behind the design. Let me explain.

Intuitively, letting the write operation return the number of written
bytes would be a no-brainer. This is what is suggested by the POSIX
write function after all. In practice, however, this approach implies
two hard problems:

1. To know the number of bytes written, the caller has to wait for
   the acknowledgement of the write operation. This synchronization
   point adds the end-to-end latency of the write operation to each
   individual write request. This is particularly bad when issuing
   sequences of write operations. Effectively, the need for waiting
   for the acknowledgement removes the opportunity of batching file-
   system requests. In a component-based system where we need to
   consider a chain of file-system servers, this problem is amplified.

   In contrast, our design facilitates the hiding of write latency
   using the principal approach of pipelining. The contract of the
   write operation is simple: When the client was able to successfully
   enqueue the write request into the file-system session's packet
   stream, the operation is successful. The client can immediately
   resume execution regardless of the latency of the write operation.

2. Assuming that we reflected the number of written bytes to the
   client, what would a client do with this information? There are
   two likely answers.

   (a) The client ignores this information. This is what happens
       in the real world for most users of the POSIX write call.
       Partial writes are generally not anticipated by application
       software because they don't happen on commodity systems.
       In this case, data would go lost.

       Anecdotally, we have repeatedly encountered this problem
       with ported software using previous versions of our VFS/libc,
       which happened to reflect partial writes to the applications
       at that time.

   (b) The client would respond by issuing another write operation
       with the remaining content. In a scenario where a write
       operation can only be done partially (e.g., pipe is full like
       in your example), this approach would ultimately result in a
       busy loop.

In short, our design tries the leverage async I/O for hiding the latency
of write operations, and it yields the desired behavior of blocking
instead of busylooping when no progress can be made.

> We are still trying to introduce a named pipe for communication between
> libc components and pure genode components. Our vfs fifo pipe now works
> with multiple threads in the same process, but only when it's hosted in
> the vfs of that same component. As soon as we host it in separate vfs
> component it stalls randomly.

My interpretation of the scenario:

- In general, multiple operations can be enqueued to a single file-
  system session. The VFS server processes the submitted operations
  strictly in order. The file-system session is a serialization point.

- The VFS pipe plugin introduces a dependency of write operations from
  read operations. If the pipe is full, a write operation has to stall
  until a reader has consumed data from the pipe.

- The pipe buffer is bounded.

- Your client uses a single file system session to submit both read and
  write requests to the VFS server.

What happens:

The pipe buffer is saturated by previous write operations.

The client issues a write operation that exceeds the remaining capacity
of the pipe buffer. Consequently, the write request stays in the packet
stream to be picked up by the VFS server the next time data can be
consumed. Each time the VFS server observes I/O, it tries to resume the
write operation. This is done piece by piece until the entire request is
completely processed. In your case, the write would stall until the pipe
buffer has gained some new room.

As file operations are processed strictly in order, the partially
processed write operation clogs up the file-system sessions packet
stream. This is because the file-system session is a serialization point.

The client submits a read operation to the file-system session. Even
though the operation got enqueued, the VFS server never 1ooks at it
because it is still concerned with the not-yet-completed write operation.

A deadlock occurs because the read operation - which is second in the
queue - would be required for the progress of the write operation (first
in the queue).

What can you do about it?

The interlocking of inter-dependent read and write in one data channel
must be avoided. In a multi-component scenario, each the reader and
writer are separate components with each having a distinct file system
session. So this situation does not occur.

For your single-component scenario, you may consider using two
file-system sessions, one for using the reading and one for the writing
end of the pipe. Both sessions would be routed to the same VFS server.

    <dir name="reader"> <fs label="pipe"/> </dir>
    <dir name="writer"> <fs label="pipe"/> </dir>

This way, read and write operations cannot interlock.

> This leads to a permanent blocking
> of the vfs component as no read operation can alleviate the full buffer
> of the fifo pipe.

The statement irritates me because the VFS server must never block. Are
you sure that the server is blocking, not merely stalling a single
session? Please connect an unrelated component to the VFS server to see
whether it remains responsive or not. The latter case would be a bug (of
the VFS server, the VFS library, or one of the used plugins).


Dr.-Ing. Norman Feske
Genode Labs

https://www.genode-labs.com · https://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth

More information about the users mailing list