Restart able block devices

Uwe geno.de at public-files.de
Thu Apr 22 21:50:46 CEST 2021


Hello all,
I have seen the  Video (0). At the end there is the question how to make drivers for block devices restart able. The only limiting factor for this is state, according to the video. To store state in restart able processes raft(1) was invented. Especially this implementation has enough modularity to be adaptable to store its data on raw block devices.
Another key part of the solution is transactional application of log segments. For this 3 fixed root blocks are needed. Every time you need to apply some segment to a root block you overwrite first a pointer (block number) in the root block with the address of the log segment you want to apply. >From this time on after reading something from that root block and anything pointed to from it you apply that segment. Ok, you will never implement it that way. You load a raft process with the data on restart and continue to run from this loaded data as if you had written that pointer this instant. Now you prepare the next root block by doing a cow operation on the data reachable from the old root block. At last you set the version field in the new root block and set the segment pointer to an empty value. Now you need to look at the version fields of all 3 root blocks on storage. There are 6 permutations of them but only one of them will fulfill the equation
H(a,b)=c.   (with H as a cryptographic hash)
This permutation assigns to every one of the root blocks a letter. The letter a is for overwrite, the letter b is for backup and the letter c is for the current root block. The new version field is calculated H(b,c) and the new root is written to a.
I think that allows fully transactional restart able block devices. The storage of the root blocks can but doesn't have to be on the same device of the Filesystem the root blocks are managing.
(1)https://github.com/canonical/raft
(0)https://video.fosdem.org/2021/D.microkernel/microkernel_pluggable_device_drivers_for_genode.webm



More information about the users mailing list