Hi Alexander,
I believe sharing code pages in Genode would be a matter of the parent component who sets up the children's address spaces. Currently, we use the sandbox library for this.
I assume that this sharing implemented on the edge between file system and page cache (at least this is true for Linux/unix and Windows)
I'm not sure whether I can follow your thoughts here exactly. In Genode, the existence of a component (i.e. process) ist not necessarily tied to a file system. Typically, the binary of a component is either loaded from a boot image or indirectly loaded from a persistent or volatile file system. By indirectly I mean that the binary is not accessed as a ROM module (i.e. via a Rom_session) rather than a file. The init component that is normally used for instantiating subsystems and which uses the sandbox library, is therefore not aware of a file system, it must only be provided a means to access ROM modules. In principle, I believe it should be possible to share code pages of the same ROM module between multiple children but this is currently not implemented in the sandbox library. By previous reply was probably a bit unclear in this matter.
in that case we need to have single VFS server with own cache/page mapping for files being shared between different instance of containers (subsystems), not only for children’s? is it true for current implementation of [single VFS+FS server] <=> [[multiple subsystems]]?
I'm afraid you lost me. In Genode, a file system is accessed via a File_system session. This session provides an API for typical file/directory operations (open/create, symlink, watch, move). File content is transferred via a packet stream (cf. Genode Foundations Book). A VFS server would access e.g. a persistent file system and deliver its contents to its own clients, which could be separate subsystems. I see two places for caching here: First, the VFS server could cache some file content so that it can be delivered to multiple clients without transferring it from the block device multiple times. Second, the clients can perform their own (local) caching. Since I'm not familiar with the internals implementation though, I don't know to what extend such mechanisms are already implemented.
if we want to share effectively files they should be visible with the same «inode» (or similar, depending upon a file system) then instance of file system should be visible from every container via single FS instance. it should handle COW as well.
I think this is exactly what a VFS server component does. It provides a File_system service to which multiple components can connect.
do you have an example of implementation of combination of VFS+FS server and a set of subsystems (at least 2) connected to the single server instance?
Maybe a good start would be to have a look at `repos/ports/run/bash.run`. In comprises a VFS server that is accesses by a bash component via the file system session and via a ROM session (with vfs_rom in between) for accessing binaries. It is pretty straightforward to extend this example to contain two bash components.
- Implementing a container runtime for Genode that sets up a
sub-init to launch the container process with the appropriate VFS and helper components according to the container configuration.
again, same question like above. typically you could use something like tinit (tiny init) for such purposes, while it is not mandatory and for many apps it will work without. but you need to understand what will be with child processes inside container, who will own them after death of parent (or this should not happens and you can use app itself as pseudo init).
Sorry, I was not crystal clear in my terminology. By "sub-init", I meant Genode's init component that we use for spawning subsystems. Honestly, I haven't spent any thought on multi-process containers. I had the impression that most commonly a container merely runs a single process, i.e. does not spawn new processes on its own.
this is not exactly true. while initially containers was developed with such an idea, later it became more complex
imagine build container - it run make inside (which fork gcc which in turn fork cpp then cc1 then as then ls and may be ar/ranlib/objcopy/etc) and if you have make -j4 - then make will run 4 parallel compilation (if Makefile allows). they must use the same file system instance (volume) to process intermediate files like .c -> .i -> .s -> .o -> .out...
returning back to genode and subsystems. how it is implemented in this moment , e.g. how make (native) can run inside genode noux? probably it use libc fork()/exec()/etc together with pthreads? do the processes (threads in genode terminology) share something bu default after start?
I'm not familiar with the implementation of the pthreads library or noux. The latter is basically retired (see release notes 20.05) and superseded by Genode's C runtime and the VFS server. Yet, the C runtime transparently implements fork/execve. Following the recursive system structure, the child processes would have a similar environment as the parent process. Yet, I'm not familiar with the defaults.
can I run a bunch of «processes» inside genode in single subsystem which share some services from outside (like VFS+FS)?
That's basically the default due to the recursive system structure.
more interesting question - do they share single swapping to disk service if need? or every subsystem has own pager with own page file?
I'm not aware of any swapping to disk feature in Genode.
I think that if I will have examples of implementation of these features in the way which will be suitable for genode subsystem-per-container model then we can have docker on genode relatively fast.
That sounds superb.
Cheers Johannes