vfs fifo pipe and file system session write

Wed Dec 9 16:51:08 CET 2020

Hi Norman

Thanks for your explanation.

>> We think the only proper fix would be a change in the file system 
>> session to allow the write operation to report back the number of
>> bytes written. Is there a specific reason to keep the write
>> 'fire-and-forget', other than simplicity?
> 
> I'm afraid that your assessment is not correct. There is indeed a
> "specific reason" behind the design. Let me explain.
> 
> Intuitively, letting the write operation return the number of written
> bytes would be a no-brainer. This is what is suggested by the POSIX
> write function after all. In practice, however, this approach implies
> two hard problems:
> 
> 1. To know the number of bytes written, the caller has to wait for
>    the acknowledgement of the write operation. This synchronization
>    point adds the end-to-end latency of the write operation to each
>    individual write request. This is particularly bad when issuing
>    sequences of write operations. Effectively, the need for waiting
>    for the acknowledgement removes the opportunity of batching file-
>    system requests. In a component-based system where we need to
>    consider a chain of file-system servers, this problem is amplified.
> 
>    In contrast, our design facilitates the hiding of write latency
>    using the principal approach of pipelining. The contract of the
>    write operation is simple: When the client was able to successfully
>    enqueue the write request into the file-system session's packet
>    stream, the operation is successful. The client can immediately
>    resume execution regardless of the latency of the write operation.

Very true, but this applies also to the read operation where such
behavior cannot be avoided.

> 2. Assuming that we reflected the number of written bytes to the
>    client, what would a client do with this information? There are
>    two likely answers.
> 
>    (a) The client ignores this information. This is what happens
>        in the real world for most users of the POSIX write call.
>        Partial writes are generally not anticipated by application
>        software because they don't happen on commodity systems.
>        In this case, data would go lost.
> 
>        Anecdotally, we have repeatedly encountered this problem
>        with ported software using previous versions of our VFS/libc,
>        which happened to reflect partial writes to the applications
>        at that time.

True, but Linux will not perform the non-blocking write to a full pipe
at all and set errno to EAGAIN.

>    (b) The client would respond by issuing another write operation
>        with the remaining content. In a scenario where a write
>        operation can only be done partially (e.g., pipe is full like
>        in your example), this approach would ultimately result in a
>        busy loop.

There are good reasons to implement an application with non-blocking
write and use select. The most common is not complicating the code with
thread synchronization.

> In short, our design tries the leverage async I/O for hiding the latency
> of write operations, and it yields the desired behavior of blocking
> instead of busylooping when no progress can be made.

As far as I understand, this leads to a blocking write when all buffers
are full even when the application intended a non-blocking write. I'm
not convinced this is a good solution for best compatibility for porting
posix/libc applications.

Also, when the write fails for another reason, such as a lack of disk
space the application can't detect that problem and data may be lost.

>> We are still trying to introduce a named pipe for communication between
>> libc components and pure genode components. Our vfs fifo pipe now works
>> with multiple threads in the same process, but only when it's hosted in
>> the vfs of that same component. As soon as we host it in separate vfs
>> component it stalls randomly.
> 
> My interpretation of the scenario:
> 
> - In general, multiple operations can be enqueued to a single file-
>   system session. The VFS server processes the submitted operations
>   strictly in order. The file-system session is a serialization point.
> 
> - The VFS pipe plugin introduces a dependency of write operations from
>   read operations. If the pipe is full, a write operation has to stall
>   until a reader has consumed data from the pipe.
> 
> - The pipe buffer is bounded.
> 
> - Your client uses a single file system session to submit both read and
>   write requests to the VFS server.
> 
> 
> What happens:
> 
> The pipe buffer is saturated by previous write operations.
> 
> The client issues a write operation that exceeds the remaining capacity
> of the pipe buffer. Consequently, the write request stays in the packet
> stream to be picked up by the VFS server the next time data can be
> consumed. Each time the VFS server observes I/O, it tries to resume the
> write operation. This is done piece by piece until the entire request is
> completely processed. In your case, the write would stall until the pipe
> buffer has gained some new room.
> 
> As file operations are processed strictly in order, the partially
> processed write operation clogs up the file-system sessions packet
> stream. This is because the file-system session is a serialization point.
> 
> The client submits a read operation to the file-system session. Even
> though the operation got enqueued, the VFS server never 1ooks at it
> because it is still concerned with the not-yet-completed write operation.
> 
> A deadlock occurs because the read operation - which is second in the
> queue - would be required for the progress of the write operation (first
> in the queue).
> 
> 
> What can you do about it?
> 
> The interlocking of inter-dependent read and write in one data channel
> must be avoided. In a multi-component scenario, each the reader and
> writer are separate components with each having a distinct file system
> session. So this situation does not occur.
> 
> For your single-component scenario, you may consider using two
> file-system sessions, one for using the reading and one for the writing
> end of the pipe. Both sessions would be routed to the same VFS server.
> 
>   <vfs>
>     ...
>     <dir name="reader"> <fs label="pipe"/> </dir>
>     <dir name="writer"> <fs label="pipe"/> </dir>
>   </vfs>
> 
> This way, read and write operations cannot interlock.

Thanks, this solves our problem.

>> This leads to a permanent blocking
>> of the vfs component as no read operation can alleviate the full buffer
>> of the fifo pipe.
> 
> The statement irritates me because the VFS server must never block. Are
> you sure that the server is blocking, not merely stalling a single
> session? Please connect an unrelated component to the VFS server to see
> whether it remains responsive or not. The latter case would be a bug (of
> the VFS server, the VFS library, or one of the used plugins).

You are of course correct, the vfs component doesn't block but retries
the same write later with not success. Read from another session works
just fine and lets the write succeed.

Bests
Stefan

-- 
Freundliche Grüsse

Stefan Thöni
Chairman of the Board
Senior Security Architect
+41 79 610 64 95

gapfruit AG
Baarerstrasse 135
6300 Zug
https://gapfruit.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.genode.org/pipermail/users/attachments/20201209/23892a29/attachment.sig>