Hello Stefan,
we have a requirement for additional filesystem tests especially to measure performance and check reliability in reset/crash scenarios. The tests should be modular to cover different file systems and block devices including CBE as well as different platforms to detect any regressions.
Would such tests be generally welcome as an upstream contribution to the genode framework from us?
I'm admittedly struggling to give you a clear-cut answer.
On the one hand, I see the benefits of a strong base of tests, in particular bad-case tests that stress corner cases. So your offer seems generous.
On the other hand, I'm weary of the social pressure and secondary costs that come with accepting contributions in areas that are rather low on our priority list. In particular, we have no current plan to put file-system performance into the spotlight in the near future. So I see the risk for being distracted from plans that are much closer to our heart. Even if high-quality tests are contributed, their integration and maintenance can still be extremely costly, sometimes much more so than we are comfortable with while overriding existing commitments.
If file-system performance was our priority, I would intuitively know where to look first, without new benchmarks. In fact, existing scenarios like the tool-chain test already amplify bottlenecks and could thereby be readily taken as basis for performance analysis, asking questions like:
- Why does the 'cp -r' operation of the tool-chain test take so long?
- What would would be the effect of moving the symlink resolution of path elements from the libc to the VFS library?
- What would be the effect of caching or batching stat calls? What's the speedup when implementing stat as an async file-system- session operation instead of an RPC?
- What benefits could be had by allocating file-system node handles at the client side?
- What would be the effect of delivering dir entries in batches?
- What's the effect of replacing the currently synchronous I/O backend of the vfs_rump plugin by issuing proper asynchronous I/O operations?
Concrete answers to those questions along with exemplary implementations would be very welcome contributions! Such an analysis requires a deep dive through many layers of the stack though.
Regarding file-system integrity and reliability, the answer is much easier. Data loss would be a critical bug. So if we are presented with a reproducible test that triggers such a case, we will immediately make it our top priority.
Cheers Norman