Hi Genodians,
I've been playing with Sculpt VC again. Doing so thought me a lot more about routing.
This time I succeeded in creating a VM with partitions on two SATA disks. It runs Nixos inside with the /home partition on an encrypted ZFS mirror. (and the root partition on a virtual disk, just like the Debian VM).
When I want to start the VM and the ahci-1.part_blk is not running, the Runtime manager reports with a diagnostic stating vm-nixos needs ahci-1.part_blk. To start my vm, I inspect a different partition on that disk. That starts the part_blk and Runtime starts my VM.
So far so good. The problems start here:
1. When the VM runs and I stop inspecting that partition, the Runtime manager kills my VM.
The manager was happy to start my VM when the resource came available but does not keep the part_blk alive when the inspector stops.
2. When the VM runs (and has a session to the part_blk for that partition, I can format that partition from the Storage component. I get data corruption errors in ZFS on that disk. (But ZFS lives on and repairs it from the other mirror).
That seems to be in violation of The Book para 4.5.8: """The part-block component requests a single block session, parses a partition table, and hands out each partition as a separate block session to its clients. There can be one client for each partition. """
I would expect that the formatter would get an error because it should not get a session to that partition.
Could someone shine a light on what's going on?
The log file is here: https://paste.wtmnd.nl/xfNMa3yY
I wrote a full write-up of my sculpting here: https://paste.wtmnd.nl/WKUQowg2 -- I intend to make it a blog post for others to learn from, hence the writing style.
Regards, Guido.
Hi Guido,
thanks for reporting your sculpting experience with us!
When I want to start the VM and the ahci-1.part_blk is not running, the Runtime manager reports with a diagnostic stating vm-nixos needs ahci-1.part_blk. To start my vm, I inspect a different partition on that disk. That starts the part_blk and Runtime starts my VM.
So far so good. The problems start here:
Just a word of caution: So far, most regular Sculpt users - myself included - use merely a single Sculpt file system for storage. Sculpt's mechanisms for inspecting and managing storage devices are there to select and use this single storage place. Your use case - assigning partitions or secondary block devices to components - goes beyond that established path. It is certainly possible - and I know of a few brave Sculpt users who are doing just that - but one needs to be extra careful.
This diagnostic message "vm-nixos requires ahci-1.part_blk" appears because your launcher for 'vm-nixos' has a route to child named "ahci-1.part_blk".
From your description, I gather that you are using two disks. Let's call
them primary and secondary where the primary disk is the one that hosts the Genode partition. In order to access the Genode partition's file system, the Sculpt manager automatically spawns the part_blk and file-system components for the primary disk and the Genode partition and keeps them around as long as the Genode partition is "used".
For the secondary disk, the Sculpt manager needs these components only for the discovery, the inspection, or to perform disk operations. But if none those operations are performed, there is no part_blk or file-system component running.
So far, you relied on Sculpt's automatism to access a partition on the secondary disk. By inspecting another partition of the disk, you forced the Sculpt manager to preserve the part_blk for the secondary disk. This is of course not a robust solution. You should instead add a custom instance of part_blk to your deploy config, which unconditionally stays there. Let's better name it differently than "ahci-1.part_blk", e.g., "manual_ahci-1.part_blk". You can route the block session of this instance to <parent label="ahci-1"/> to let it access the device directly. Once your custom part_blk instance is running, you can route the vm-nixos' block session to "manual_ahci-1.part_blk".
With this approach, you won't need to rely on any side effect of the Sculpt manager's built-in policy. Note however, that the inspection or management of the secondary disk is not expected to work as long as "manual_ahci-1.part_blk" is deployed. The AHCI-1 disk can be used by only one component at a time.
- When the VM runs (and has a session to the part_blk for that
partition, I can format that partition from the Storage component. I get data corruption errors in ZFS on that disk. (But ZFS lives on and repairs it from the other mirror).
That seems to be in violation of The Book para 4.5.8: """The part-block component requests a single block session, parses a partition table, and hands out each partition as a separate block session to its clients. There can be one client for each partition. """
I would expect that the formatter would get an error because it should not get a session to that partition.
Admittedly, I haven't tried handing out one partition of part_blk to multiple clients at the same time. I agree that part_blk should better outright deny this situation because it can only result from unintended routing rules.
I hope that I could shed some light on the mysteries you observed.
Cheers and happy sculpting!
Norman
Hi Norman,
Thanks for your quick and clear response.
On 12/18/18 12:03, Norman Feske wrote:
For the secondary disk, the Sculpt manager needs these components only for the discovery, the inspection, or to perform disk operations. But if none those operations are performed, there is no part_blk or file-system component running.
I wonder why the manager did start my vm-nixos when part_blk became available but decided to kill it when the inspector closes. It knows there is a dependency.
With this approach, you won't need to rely on any side effect of the Sculpt manager's built-in policy. Note however, that the inspection or management of the secondary disk is not expected to work as long as "manual_ahci-1.part_blk" is deployed. The AHCI-1 disk can be used by only one component at a time.
Bummer, I had in mind to dedicate that secondary partition to the vm-nixos with a policy and leave the rest to Sculpt.
But first, I'll give your suggestion a try. I'll probably learn a lot again.
(Or I'll give the whole disks to the VM, that's what I want in my desktop environment, eventually. ZFS loves whole disks.)
Cheers, Guido.
Hi Guido,
I wonder why the manager did start my vm-nixos when part_blk became available but decided to kill it when the inspector closes. It knows there is a dependency.
init restarts a component whenever any of the component's existing session routes change. This is to enforce the new routing policy.
In your scenario, 'vm-nixos' had a block session routed to 'ahci-1.part_blk'. When you closed the inspect window, the sculpt manager decided to remove the 'ahci-1.part_blk' because it was solely needed to support the inspect window. This removal, in turn, invalidated the block-session route of 'vm-nixos', which prompted the runtime's init to restart the 'vm-nixos' component. The restarted version, however, could not establish its block-session route. So it got stuck at this point.
In short, the Sculpt manager did not affect the 'vm-nixos' directly. The observed killing was actually a restart, indirectly caused by the removal of the 'ahci-1.part_blk'.
Bummer, I had in mind to dedicate that secondary partition to the vm-nixos with a policy and leave the rest to Sculpt.
We will get there. But we are not there yet. ;-)
Cheers Norman