Hello Uwe,
On 22.04.21 21:50, Uwe wrote:
I have seen the Video (0). At the end there is the question how to make drivers for block devices restart able. The only limiting factor for this is state, according to the video. To store state in restart able processes raft(1) was invented. Especially this implementation has enough modularity to be adaptable to store its data on raw block devices.
thanks for commenting on my talk and for sharing your perspective.
The stance I expressed is primarily an economic one. For driver classes covered by the talk (network, graphics, input), the architectural change towards the pluggable device drivers actually reduced the interface complexity, which is beautiful. The main concern were drivers in the order of 100 thousand lines code (thinking of the wifi or intel graphics drivers), which are no longer critical for the liveliness of the system. So when looking at the effort, complexity implications, and benefit, this change is a clear win.
For block drivers, the equation looks different.
First, each of our block drivers (NVMe, AHCI, USB block) comprises about 1500 lines of code. With this rather low complexity, we can attain robustness by the means of code-quality measures. When equaling risk with source-code complexity, the pressure to make block drivers restartable is two orders of magnitudes weaker compared to the drivers mentioned above.
Second, the restartability mechanism would introduce new risks. We are faced with the problem of replicating large states. Even though solutions exist, they introduce new complexity instead of taking away complexity. Intuitively, I think that a solution would likely exceed the 1500 lines of code of the block driver. So the net gain from a risk-assessment perspective is dubious.
That being said, please don't take my words as discouragement. If you are interested to explore this direction, I'd be delighted to learn about your findings.
Cheers Norman