Hi Johannes,
As far as I understood, a container is basically a filesystem image with some configuration of how to set things up. The container runtime will read the configuration and prepare everything depending on the target system before it launches the process defined by the container.
yes, except the fact that if we want all infrastructure being applicable (all these container management tools like kubernetis/etc) then we need to support appropriate API on the host level which, in turn, requires 1. golang support 2. some containertization features like controllable communications between containers 3. cow on file system base (optional).
After that, the started container is merely a standard process that has been encapsulated with namespaces, cgroup and other isolation mechanisms. The process performs syscalls just like a non-containerised process would do.
definitely so. while there are something «around» process - e.g. a way to execute process inside existing container, error handling/etc
Thus, when thinking about running a container on Genode, I noticed we have most ingredients already in stock since a Genode component is a sandboxed process with its resource quota and local namespace.
partially (mostly) yes, while some infrastructure around are missing
Regarding the file system virtualisation, we have the VFS and can even host a shared VFS in a dedicated server component. I'm not sure about a copy-on-write feature, though.
main idea of sharing was scalability. What happens if we will try to run the same executable (e.g. Apache with 10M code) in 2 different containers ? if this is VM like containers then with big probability we will not share the code pages (e.g. for Windows some of the DLL code pages do contain variables, so, even for the same executable pages content could be different). imagine you have 100 containers with apache. they will eat 10m x 100 = 1GB of ram just for code pages. while could only 10m
Another problem is a memory distribution. imagine that you have, for example, kernel object descriptor of 25 bytes, and a lot of them. if you have single os image then you have single memory allocator (if this is linux- slab/slub/etc) when you could store object instances related to different containers on the same memory page.
if you have own copy of everything - again, you will not just only inefficiently waste the kernel memory space for unused tails, but also will spend memory bandwidth/etc.
if we want to share effectively files they should be visible with the same «inode» (or similar, depending upon a file system) then instance of file system should be visible from every container via single FS instance. it should handle COW as well.
While again, FS is not first priority, e.g. widows version of native (non-linux) containers do not have any dedicated FS (use just slow layers onto of NTFS).
In my (current) point of view, enabling containerised workloads on Genode probably requires three ingredients:
- Implementing additional VFS plugins for mounting container images,
overlays, and cow functionality.
not 100% necessary, you can run (slowly!) just tar, this is supported by docker
- Adding missing plugins for special file nodes in devfs, sysfs or
procfs. This highly depends on what the particular container process expects, though.
this require to have an answer for question: what model you want to utilize? if linux one - then yes, /proc /sys /dev and so on. But, my porting experience show that this is tricky way - if you try to pretend to be linux - you need to have full size emulation of linux facitities, api/etc. Every time when you try to make something it force you to use linux tricky way. as for golang runtime I choose NetBSD as a base as less advanced, adding some features from other systems. E.g. it could use some exotic TCP or fcntl options to force container to do something.
IMHO there are no simple answer, always pro/con trade-off.
- Implementing a container runtime for Genode that sets up a sub-init
to launch the container process with the appropriate VFS and helper components according to the container configuration.
again, same question like above. typically you could use something like tinit (tiny init) for such purposes, while it is not mandatory and for many apps it will work without. but you need to understand what will be with child processes inside container, who will own them after death of parent (or this should not happens and you can use app itself as pseudo init).
Re. 3., I'm uncertain whether this is best approached from scratch or by porting existing runtimes such as runc or crun. The downside of the
it is too big to be rewritten. I do consider both ways and found that for me easier to port, I have really huge porting experience.
latter approach is that it requires us to provide all the Linux management interfaces such as cgroup, namespaces, etc. and map these to Genode sub-init configuration. Parsing the container configuration and applying the appropriate actions directly seems more natural to me at the moment.
imho it is easier to tailor (cut) appropriate code from existing golang sources for different platforms then to write it from the scratch.
@Alexander: What do you think are the major road blocks for running a first container image on Genode?
it depends upon definition of «run». for me to run docker container on my first project on some new OS I ported 3 coupled apps including runc, containerd and docker cli. they are connected via unix sockets. then you have api to create/delete/manipulate containers.
I don’t see any principal roadblocks while think that having some compatible with linux API for ns and cgroups will help a lot while again -non mandatory.
we need ti invent the way how to manipulate set of processes (signals are virtually absent in terms of Unix on Genode), connect and reconnect pipes (unix sockets), and run chroot() - like (anything like this available on Genode?)
I assume a lot of work related to tuning/utlization of some features used by container manipulation (e.g. file descriptor flags, options related to exec/fork and similar) while there are Windows containers without them definitely can exists (they use native Windows API).
but all problems probably solvable. IMHO. may be we need to implement something inside libc port as well.
Alexander