Dear Genode community,
as some of you already know, I want to create a checkpoint/restore mechanism for components on Genode 16.05 and Fiasco.OC. I want to create a component which monitors the PD and CPU sessions of its children (= targets for checkpoint/restore) to have access to their memory and thread states.
For a complete checkpoint and restore of a child I also need to store the capabilities used by a child. How can I acquire and also restore all these capabilities?
After a restore of a component the capability space shall be the "same" as before the checkpoint: 1) The capabilities after the restore shall point to corresponding object identities. 2) Also the capabilities after the restore shall be on the same slot (have the same address) in the capability space as before the checkpoint.
The capability space resides in the kernel and Genode does not offer an API to manipulate it. Is there a way to accomplish my goal with Genode's API anyway?
Kind regards, Denis
Hello Denis,
After a restore of a component the capability space shall be the "same" as before the checkpoint:
- The capabilities after the restore shall point to corresponding
object identities. 2) Also the capabilities after the restore shall be on the same slot (have the same address) in the capability space as before the checkpoint.
The capability space resides in the kernel and Genode does not offer an API to manipulate it. Is there a way to accomplish my goal with Genode's API anyway?
there is no ready-to-use solution via the Genode API because the way capabilities are handled vastly differs between the various kernels. Manipulating the capability space of a remote component wouldn't even be possible on some kernels. However, since you are using a specific kernel (Fiasco.OC) that provides an asynchronous map operation, the problem can be tackled in a kernel-specific way.
I would propose to extend the existing 'Foc_native_pd' RPC interface [1] with RPC functions for requesting and installing capabilities from/into the capability space of the PD.
[1] https://github.com/genodelabs/genode/tree/master/repos/base-foc/include/foc_...
The function for requesting a capability would have an iterator-like interface that allows the client to iterate over the PD-local selector numbers and sequentially obtain the underlying capabilities as Genode::Native_capability objects (which can be delegated via RPC). Each call would return a Genode::Native_capability and a selector number of the capability to request with the next RPC call. In a first version, you may simply iterate over all numbers up to the maximum selector number, returning invalid capabilities for unused selectors. The iterator-like interface would then be a performance optimization.
The function for installing a capability would take a Genode::Native_capability and the destination selector number as arguments. The implementation of the 'Foc_native_pd' interface resides in core, which has access to all capabilities. The implementation would directly issue Fiasco.OC system calls (most likely 'l4_task_map') to install the given capability into the targeted PD.
Does this sound like a reasonable plan?
Cheers Norman
Hello Norman,
your approach sounds really good and promising. But I have a problem when storing the Capabilities from the Cap Space of the child:
The child component shall be migrated from one ECU to another. The Genode system on the other ECU may have the Rpc_objects, which the child needs (e.g. shared dataspaces), but their object identities are different (e.g. other addresses in memory) or the Rpc_objects do not exist (e.g. a session object between the child and a service).
During a restore, I will have to relink the Native_capability to the available Rpc_object or simply recreate the Rpc_object. In both cases I have to know the types of the Native_capabilities, when I snapshot them from the Cap Space of the child. Is there a way to find out the type of a Native_capability through an API function?
If there is no ready-to-use function/approach, can I intercept the type to which a Native_capability is reinterpreted in Rpc_entrypoint::manage as a workaround solution?
Kind regards, Denis
On 31.08.2016 18:43, Norman Feske wrote:
Hello Denis,
After a restore of a component the capability space shall be the "same" as before the checkpoint:
- The capabilities after the restore shall point to corresponding
object identities. 2) Also the capabilities after the restore shall be on the same slot (have the same address) in the capability space as before the checkpoint.
The capability space resides in the kernel and Genode does not offer an API to manipulate it. Is there a way to accomplish my goal with Genode's API anyway?
there is no ready-to-use solution via the Genode API because the way capabilities are handled vastly differs between the various kernels. Manipulating the capability space of a remote component wouldn't even be possible on some kernels. However, since you are using a specific kernel (Fiasco.OC) that provides an asynchronous map operation, the problem can be tackled in a kernel-specific way.
I would propose to extend the existing 'Foc_native_pd' RPC interface [1] with RPC functions for requesting and installing capabilities from/into the capability space of the PD.
[1] https://github.com/genodelabs/genode/tree/master/repos/base-foc/include/foc_...
The function for requesting a capability would have an iterator-like interface that allows the client to iterate over the PD-local selector numbers and sequentially obtain the underlying capabilities as Genode::Native_capability objects (which can be delegated via RPC). Each call would return a Genode::Native_capability and a selector number of the capability to request with the next RPC call. In a first version, you may simply iterate over all numbers up to the maximum selector number, returning invalid capabilities for unused selectors. The iterator-like interface would then be a performance optimization.
The function for installing a capability would take a Genode::Native_capability and the destination selector number as arguments. The implementation of the 'Foc_native_pd' interface resides in core, which has access to all capabilities. The implementation would directly issue Fiasco.OC system calls (most likely 'l4_task_map') to install the given capability into the targeted PD.
Does this sound like a reasonable plan?
Cheers Norman
Hi Denis,
The child component shall be migrated from one ECU to another. The Genode system on the other ECU may have the Rpc_objects, which the child needs (e.g. shared dataspaces), but their object identities are different (e.g. other addresses in memory) or the Rpc_objects do not exist (e.g. a session object between the child and a service).
so the problem goes much deeper than merely requesting and populating the child's capability space. You need the replicate the entire child's execution environment at the destination ECU. That means for each capability in possession of the child, your runtime needs to know the exact meaning. E.g., if the child has a session capability to a session created with certain session arguments, the same kind of session must be re-created at the destination ECU. Of course, the same holds for all dataspaces, threads, and other RPC objects that the child can reference via the capabilities present in its capability space.
The logical consequence is that the runtime must virtualize all services used by the child. E.g. if the child creates a LOG session, the runtime would create a session to a LOG service in the child's name but hand out a capability locally implemented LOG-session wrapper - similar to what you have already done for the RAM service. So when migrating the child, you now exactly what the various capabilities in the child's capability space mean and can transfer the underlying state to the destination ECU.
In principle, this is how Noux solves the fork problem. But in the case of Noux, I deliberately avoid populating the child's capability space with Genode capabilities in order to alleviate the need to virtualize many Genode services. Instead, I let the child use the Noux session as its only interface to the outside world. At the Noux-session level, the child does not talk about Genode capabilities but about file descriptors, for which Noux knows the meaning. Of course there exist a few capabilities in the child's capability space, in particular the parent cap, the Noux-session cap, and the caps of the child's environment. But these few capabilities are manually re-initialized by the freshly created process after the fork.
In your case, you want to replicate the child's capability space in a way that is transparent to the child. Like Noux, you need the have a complete model of the child's execution environment in your runtime. Unlike Noux, however, you want to let the child interact with various Genode services. Consequently, your model needs to capture the those services.
During a restore, I will have to relink the Native_capability to the available Rpc_object or simply recreate the Rpc_object. In both cases I have to know the types of the Native_capabilities, when I snapshot them from the Cap Space of the child. Is there a way to find out the type of a Native_capability through an API function?
As discussed above, the type alone does not suffice. Your runtime needs to know the actual semantics behind each capability, e.g., not just the knowledge that a certain capability is a RAM-session capability but also the information how much quota the RAM session has and which dataspaces belong to it. Or as another example, you don't just need to know that a capability is a file-system session but also the session arguments that were used when the session was created.
If there is no ready-to-use function/approach, can I intercept the type to which a Native_capability is reinterpreted in Rpc_entrypoint::manage as a workaround solution?
Since your runtime needs to create a representative for each RPC object the child interacts with in the form of a locally implemented RPC object (managed by the runtime's entrypoint), you can in principle use the 'Rpc_entrypoint::apply' method to look up the local RPC object for a given capability.
Best regards Norman
Hello Norman,
thank you for your great answer. I will follow your advise and virtualize all necessary services that a target component uses.
Kind regards, Denis
On 09.09.2016 10:58, Norman Feske wrote:
Hi Denis,
The child component shall be migrated from one ECU to another. The Genode system on the other ECU may have the Rpc_objects, which the child needs (e.g. shared dataspaces), but their object identities are different (e.g. other addresses in memory) or the Rpc_objects do not exist (e.g. a session object between the child and a service).
so the problem goes much deeper than merely requesting and populating the child's capability space. You need the replicate the entire child's execution environment at the destination ECU. That means for each capability in possession of the child, your runtime needs to know the exact meaning. E.g., if the child has a session capability to a session created with certain session arguments, the same kind of session must be re-created at the destination ECU. Of course, the same holds for all dataspaces, threads, and other RPC objects that the child can reference via the capabilities present in its capability space.
The logical consequence is that the runtime must virtualize all services used by the child. E.g. if the child creates a LOG session, the runtime would create a session to a LOG service in the child's name but hand out a capability locally implemented LOG-session wrapper - similar to what you have already done for the RAM service. So when migrating the child, you now exactly what the various capabilities in the child's capability space mean and can transfer the underlying state to the destination ECU.
In principle, this is how Noux solves the fork problem. But in the case of Noux, I deliberately avoid populating the child's capability space with Genode capabilities in order to alleviate the need to virtualize many Genode services. Instead, I let the child use the Noux session as its only interface to the outside world. At the Noux-session level, the child does not talk about Genode capabilities but about file descriptors, for which Noux knows the meaning. Of course there exist a few capabilities in the child's capability space, in particular the parent cap, the Noux-session cap, and the caps of the child's environment. But these few capabilities are manually re-initialized by the freshly created process after the fork.
In your case, you want to replicate the child's capability space in a way that is transparent to the child. Like Noux, you need the have a complete model of the child's execution environment in your runtime. Unlike Noux, however, you want to let the child interact with various Genode services. Consequently, your model needs to capture the those services.
During a restore, I will have to relink the Native_capability to the available Rpc_object or simply recreate the Rpc_object. In both cases I have to know the types of the Native_capabilities, when I snapshot them from the Cap Space of the child. Is there a way to find out the type of a Native_capability through an API function?
As discussed above, the type alone does not suffice. Your runtime needs to know the actual semantics behind each capability, e.g., not just the knowledge that a certain capability is a RAM-session capability but also the information how much quota the RAM session has and which dataspaces belong to it. Or as another example, you don't just need to know that a capability is a file-system session but also the session arguments that were used when the session was created.
If there is no ready-to-use function/approach, can I intercept the type to which a Native_capability is reinterpreted in Rpc_entrypoint::manage as a workaround solution?
Since your runtime needs to create a representative for each RPC object the child interacts with in the form of a locally implemented RPC object (managed by the runtime's entrypoint), you can in principle use the 'Rpc_entrypoint::apply' method to look up the local RPC object for a given capability.
Best regards Norman
Hello again,
I have two small problems where I need some guidance from you :)
1. I am trying to understand the mechanism of l4_task_map [1]. Are the following thoughts correct?
* The destination and source task cap (first 2 args of l4_task_map) can be retrieved through Pd_session::native_pd() and Foc_native_pd::task_cap(). * Send flexpage (arg #3) describes a memory area which contains the selector number (= address) of the source task's capability. * The send base (arg #4) is an integer which contains the address of the capability of the the destination task and also an operation code number for e.g. mapping or granting the capability.
[1] https://l4re.org/doc/group__l4__task__api.html#ga0a883fb598c3320922f0560263d...
To iterate through all possible capabilities I need to know where the capability space starts (first valid selector number) and where it ends. Where can I find these information? I.e. which source files are relevant?
2. I also wanted to look up the mechanism of Noux where it re-initializes the parent cap, the noux session cap, and the caps of a child's environment after a fork. But I cannot find the corresponding files.
Kind regards, Denis
On 10.09.2016 11:52, Denis Huber wrote:
Hello Norman,
thank you for your great answer. I will follow your advise and virtualize all necessary services that a target component uses.
Kind regards, Denis
On 09.09.2016 10:58, Norman Feske wrote:
Hi Denis,
The child component shall be migrated from one ECU to another. The Genode system on the other ECU may have the Rpc_objects, which the child needs (e.g. shared dataspaces), but their object identities are different (e.g. other addresses in memory) or the Rpc_objects do not exist (e.g. a session object between the child and a service).
so the problem goes much deeper than merely requesting and populating the child's capability space. You need the replicate the entire child's execution environment at the destination ECU. That means for each capability in possession of the child, your runtime needs to know the exact meaning. E.g., if the child has a session capability to a session created with certain session arguments, the same kind of session must be re-created at the destination ECU. Of course, the same holds for all dataspaces, threads, and other RPC objects that the child can reference via the capabilities present in its capability space.
The logical consequence is that the runtime must virtualize all services used by the child. E.g. if the child creates a LOG session, the runtime would create a session to a LOG service in the child's name but hand out a capability locally implemented LOG-session wrapper - similar to what you have already done for the RAM service. So when migrating the child, you now exactly what the various capabilities in the child's capability space mean and can transfer the underlying state to the destination ECU.
In principle, this is how Noux solves the fork problem. But in the case of Noux, I deliberately avoid populating the child's capability space with Genode capabilities in order to alleviate the need to virtualize many Genode services. Instead, I let the child use the Noux session as its only interface to the outside world. At the Noux-session level, the child does not talk about Genode capabilities but about file descriptors, for which Noux knows the meaning. Of course there exist a few capabilities in the child's capability space, in particular the parent cap, the Noux-session cap, and the caps of the child's environment. But these few capabilities are manually re-initialized by the freshly created process after the fork.
In your case, you want to replicate the child's capability space in a way that is transparent to the child. Like Noux, you need the have a complete model of the child's execution environment in your runtime. Unlike Noux, however, you want to let the child interact with various Genode services. Consequently, your model needs to capture the those services.
During a restore, I will have to relink the Native_capability to the available Rpc_object or simply recreate the Rpc_object. In both cases I have to know the types of the Native_capabilities, when I snapshot them from the Cap Space of the child. Is there a way to find out the type of a Native_capability through an API function?
As discussed above, the type alone does not suffice. Your runtime needs to know the actual semantics behind each capability, e.g., not just the knowledge that a certain capability is a RAM-session capability but also the information how much quota the RAM session has and which dataspaces belong to it. Or as another example, you don't just need to know that a capability is a file-system session but also the session arguments that were used when the session was created.
If there is no ready-to-use function/approach, can I intercept the type to which a Native_capability is reinterpreted in Rpc_entrypoint::manage as a workaround solution?
Since your runtime needs to create a representative for each RPC object the child interacts with in the form of a locally implemented RPC object (managed by the runtime's entrypoint), you can in principle use the 'Rpc_entrypoint::apply' method to look up the local RPC object for a given capability.
Best regards Norman
genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main
Hello Denis,
On 09/21/2016 05:42 PM, Denis Huber wrote:
Hello again,
I have two small problems where I need some guidance from you :)
- I am trying to understand the mechanism of l4_task_map [1]. Are the
following thoughts correct?
- The destination and source task cap (first 2 args of l4_task_map) can
be retrieved through Pd_session::native_pd() and Foc_native_pd::task_cap().
- Send flexpage (arg #3) describes a memory area which contains the
selector number (= address) of the source task's capability.
- The send base (arg #4) is an integer which contains the address of the
capability of the the destination task and also an operation code number for e.g. mapping or granting the capability.
[1] https://l4re.org/doc/group__l4__task__api.html#ga0a883fb598c3320922f0560263d...
That is correct.
To iterate through all possible capabilities I need to know where the capability space starts (first valid selector number) and where it ends. Where can I find these information? I.e. which source files are relevant?
THe capability space of each component is split between an area controlled by core, and one controlled by the component itself. Everything underneath Fiasco::USER_BASE_CAP (in file: repos/base-foc/include/foc/native_capability.h:63) is used by core, and has the following layout: the first nine slots are reserved to not interfere with fixed capabilities of Fiasco.OC/L4Re. The only capabilities of this fixed area that we use are the task capability (slot 1) and the parent capability (slot 8). The rest of the core area is divided into thread-local capabilities. Every thread has three dedicated capabilities: a capability to its own IPC gate (so to say its identity), a capability to its pager object, and a capability to an IRQ object (some kind of kernel semaphore), that is used for blocking in the case of lock-contention. You can find the layout information again in the file: repos/base-foc/include/foc/native_capability.h.
Everything starting from slot 200 is controlled by the component itself. Each component has a capability allocator, and some kind of registry containing all currently allocated capabilities that is called "cap map":
repos/base-foc/src/include/base/internal/cap_* repos/base-foc/src/lib/base/cap_*
Currently, the per-component capability allocator is (compile-time) restricted to a number of up to 4K capabilities. The special component core can allocate more capabilities, because it always owns every capability in the system.
The capability space controlled by the component thereby ranges from 200-4296, but it is filled sparsely. When not knowing the "cap map" of a component, you can however check the validity of a single capability with `l4_task_cap_valid`, have a look here:
https://l4re.org/doc/group__l4__task__api.html#ga829a1b5cb4d5dba33ffee57534a...
- I also wanted to look up the mechanism of Noux where it
re-initializes the parent cap, the noux session cap, and the caps of a child's environment after a fork. But I cannot find the corresponding files.
AFAIK, in Noux the parent capability in the .data section of the program gets overwritten:
repos/ports/src/noux/child.h:458 repos/ports/src/noux/ram_session_component.h:80
After that parts of the main thread initialization of the target needs to be re-done, otherwise e.g., the serialized form of the parent capability in the data section would have no effect. But I'm not well up with respect to Noux initialization. After some grep, I found this being the first routine executed by the forked process:
repos/ports/src/lib/libc_noux/plugin.cc:526
It shows up, how parent capability gets set, and the environment gets re-loaded.
Best regards Stefan
Kind regards, Denis
On 10.09.2016 11:52, Denis Huber wrote:
Hello Norman,
thank you for your great answer. I will follow your advise and virtualize all necessary services that a target component uses.
Kind regards, Denis
On 09.09.2016 10:58, Norman Feske wrote:
Hi Denis,
The child component shall be migrated from one ECU to another. The Genode system on the other ECU may have the Rpc_objects, which the child needs (e.g. shared dataspaces), but their object identities are different (e.g. other addresses in memory) or the Rpc_objects do not exist (e.g. a session object between the child and a service).
so the problem goes much deeper than merely requesting and populating the child's capability space. You need the replicate the entire child's execution environment at the destination ECU. That means for each capability in possession of the child, your runtime needs to know the exact meaning. E.g., if the child has a session capability to a session created with certain session arguments, the same kind of session must be re-created at the destination ECU. Of course, the same holds for all dataspaces, threads, and other RPC objects that the child can reference via the capabilities present in its capability space.
The logical consequence is that the runtime must virtualize all services used by the child. E.g. if the child creates a LOG session, the runtime would create a session to a LOG service in the child's name but hand out a capability locally implemented LOG-session wrapper - similar to what you have already done for the RAM service. So when migrating the child, you now exactly what the various capabilities in the child's capability space mean and can transfer the underlying state to the destination ECU.
In principle, this is how Noux solves the fork problem. But in the case of Noux, I deliberately avoid populating the child's capability space with Genode capabilities in order to alleviate the need to virtualize many Genode services. Instead, I let the child use the Noux session as its only interface to the outside world. At the Noux-session level, the child does not talk about Genode capabilities but about file descriptors, for which Noux knows the meaning. Of course there exist a few capabilities in the child's capability space, in particular the parent cap, the Noux-session cap, and the caps of the child's environment. But these few capabilities are manually re-initialized by the freshly created process after the fork.
In your case, you want to replicate the child's capability space in a way that is transparent to the child. Like Noux, you need the have a complete model of the child's execution environment in your runtime. Unlike Noux, however, you want to let the child interact with various Genode services. Consequently, your model needs to capture the those services.
During a restore, I will have to relink the Native_capability to the available Rpc_object or simply recreate the Rpc_object. In both cases I have to know the types of the Native_capabilities, when I snapshot them from the Cap Space of the child. Is there a way to find out the type of a Native_capability through an API function?
As discussed above, the type alone does not suffice. Your runtime needs to know the actual semantics behind each capability, e.g., not just the knowledge that a certain capability is a RAM-session capability but also the information how much quota the RAM session has and which dataspaces belong to it. Or as another example, you don't just need to know that a capability is a file-system session but also the session arguments that were used when the session was created.
If there is no ready-to-use function/approach, can I intercept the type to which a Native_capability is reinterpreted in Rpc_entrypoint::manage as a workaround solution?
Since your runtime needs to create a representative for each RPC object the child interacts with in the form of a locally implemented RPC object (managed by the runtime's entrypoint), you can in principle use the 'Rpc_entrypoint::apply' method to look up the local RPC object for a given capability.
Best regards Norman
genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main
genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main
Hello Stefan,
thank you for the descriptive explanation :) I found out, that it does not suffice to map the (kernel) Capability from the target application to the Checkpoint/Restore application, because the Checkpoint/Restore application knows only already existing (Genode) Capabilities (kcap and key value) through the interception of Rpc_objects (e.g. own dataspace, rm_session, etc.) the target application uses.
Mapping a Capability gives me a new (kernel) Capability which points to the same object identity, but has a new kcap (= Capability space slot) value.
Through intercepting all services the target application uses, the Checkpoint/Restore application knows (probably) all necessary Capabilities which are created through issuing the parent. But what about Capabilities which are created through a local service of the target application?
The target application could create its own service with a root and session Rpc_object and manage requests through an Entrypoint. Although the Entrypoint creates new Capabilities through the PD session which the Checkpoint/Restore intercepts (PD::alloc_rpc_cap). The Checkpoint/Restore application cannot associate the created Capability to a concrete Rpc_object which is created by the target application itself.
To solve this problem I did not find any solutions which is transparent to the target application nor is possible without modifying the kernel. A non-transparent, but user-level solution would be to let the Checkpoint/Restore application implement the service of the target application. But this will impose rewriting existing Genode components, which I would avoid.
Perhaps someone in the Genode community has an idea, how I can get access to the target application's Rpc_objects created by its own service.
Kind regards, Denis
On 22.09.2016 10:16, Stefan Kalkowski wrote:
Hello Denis,
On 09/21/2016 05:42 PM, Denis Huber wrote:
Hello again,
I have two small problems where I need some guidance from you :)
- I am trying to understand the mechanism of l4_task_map [1]. Are the
following thoughts correct?
- The destination and source task cap (first 2 args of l4_task_map) can
be retrieved through Pd_session::native_pd() and Foc_native_pd::task_cap().
- Send flexpage (arg #3) describes a memory area which contains the
selector number (= address) of the source task's capability.
- The send base (arg #4) is an integer which contains the address of the
capability of the the destination task and also an operation code number for e.g. mapping or granting the capability.
[1] https://l4re.org/doc/group__l4__task__api.html#ga0a883fb598c3320922f0560263d...
That is correct.
To iterate through all possible capabilities I need to know where the capability space starts (first valid selector number) and where it ends. Where can I find these information? I.e. which source files are relevant?
THe capability space of each component is split between an area controlled by core, and one controlled by the component itself. Everything underneath Fiasco::USER_BASE_CAP (in file: repos/base-foc/include/foc/native_capability.h:63) is used by core, and has the following layout: the first nine slots are reserved to not interfere with fixed capabilities of Fiasco.OC/L4Re. The only capabilities of this fixed area that we use are the task capability (slot 1) and the parent capability (slot 8). The rest of the core area is divided into thread-local capabilities. Every thread has three dedicated capabilities: a capability to its own IPC gate (so to say its identity), a capability to its pager object, and a capability to an IRQ object (some kind of kernel semaphore), that is used for blocking in the case of lock-contention. You can find the layout information again in the file: repos/base-foc/include/foc/native_capability.h.
Everything starting from slot 200 is controlled by the component itself. Each component has a capability allocator, and some kind of registry containing all currently allocated capabilities that is called "cap map":
repos/base-foc/src/include/base/internal/cap_* repos/base-foc/src/lib/base/cap_*
Currently, the per-component capability allocator is (compile-time) restricted to a number of up to 4K capabilities. The special component core can allocate more capabilities, because it always owns every capability in the system.
The capability space controlled by the component thereby ranges from 200-4296, but it is filled sparsely. When not knowing the "cap map" of a component, you can however check the validity of a single capability with `l4_task_cap_valid`, have a look here:
https://l4re.org/doc/group__l4__task__api.html#ga829a1b5cb4d5dba33ffee57534a...
- I also wanted to look up the mechanism of Noux where it
re-initializes the parent cap, the noux session cap, and the caps of a child's environment after a fork. But I cannot find the corresponding files.
AFAIK, in Noux the parent capability in the .data section of the program gets overwritten:
repos/ports/src/noux/child.h:458 repos/ports/src/noux/ram_session_component.h:80
After that parts of the main thread initialization of the target needs to be re-done, otherwise e.g., the serialized form of the parent capability in the data section would have no effect. But I'm not well up with respect to Noux initialization. After some grep, I found this being the first routine executed by the forked process:
repos/ports/src/lib/libc_noux/plugin.cc:526
It shows up, how parent capability gets set, and the environment gets re-loaded.
Best regards Stefan
Hi Denis,
The target application could create its own service with a root and session Rpc_object and manage requests through an Entrypoint. Although the Entrypoint creates new Capabilities through the PD session which the Checkpoint/Restore intercepts (PD::alloc_rpc_cap). The Checkpoint/Restore application cannot associate the created Capability to a concrete Rpc_object which is created by the target application itself.
that is true. The monitoring component has no idea about the meaning of RPC objects created internally within the child.
But the child never uses such capabilities to talk to the outside world. If such a capability is created to provide a service to the outside world (e.g., a session capability), your monitoring component will actually get hold of it along with the information of its type. I.e., the child passes a root capability via the 'Parent::announce' RPC function to the monitoring component, or the monitoring component receives a session capability as a response of a 'Root::session' RPC call (which specifies the name of the session type as argument).
Those capabilities are - strictly speaking - not needed to make the child happy, but merely to enable someone else to use the child's service. However, there is also the case where the child uses RPCs in a component-local way. Even though the monitoring component does not need to know the meaning behind those capabilities, it needs to replicate the association of the component's internal RPC objects with the corresponding kernel capabilities.
To solve this problem I did not find any solutions which is transparent to the target application nor is possible without modifying the kernel. A non-transparent, but user-level solution would be to let the Checkpoint/Restore application implement the service of the target application. But this will impose rewriting existing Genode components, which I would avoid.
Perhaps someone in the Genode community has an idea, how I can get access to the target application's Rpc_objects created by its own service.
This is indeed a tricky problem. I see two possible approaches:
1. Because the monitoring component is in control of the child's PD session (and thereby the region map of the child's address space), it may peek and poke in the virtual memory of the child (e.g., it may may attach a portion of the child's address space as a managed dataspace to its own region map). In particular, it could inspect and manipulate the child-local meta data for the child's capability space where it keeps the association between RPC object identities and kcap selectors. This approach would require the monitor to interpret the child's internal data structures, similar to what a debugger does.
2. We may let the child pro-actively propagate information about its capability space to the outside so that the monitoring component can conveniently intercept this information. E.g. as a rough idea, we could add a 'Pd_session::cap_space_dataspace' RPC function where a component can request a dataspace capability for a memory buffer where it reports the layout information of its capability space. This could happen internally in the base library. So it would be transparent for the application code.
I think however that merely propagating information from the child may not be enough. You also may need a way to re-assign new RPC object identities to the capability space of the restored child.
Noux employs a mix of both approaches when forking a process. The parent capability is poked directly into the address space of the new process whereas all other capabilities are re-initialized locally in the child. Maybe you could find a middle ground where the child component reports just enough internal information (e.g., the pointer to its 'cap_map') to let the monitor effectively apply the first approach (peeking and poking)?
Btw, just as a side remark, this problem does not exist on the base-hw kernel where the RPC object identities are equal to the capability selectors.
Cheers Norman
Hello Norman,
thanks again for your explanation.
It sounds good, that I do not have to checkpoint the component-intern session capabilities, if they are not used by the same component. What about the locally created capabilities which are created during Entrypoint creation?
In particular, when the target component creates an Entrypoint object, then it creates a Native_capability (as Ipc_server) from a capability found in the utcb's thread control registers:
repos/base-foc/src/lib/base/ipc.cc:377
The Ipc_server capability is used in two calls to Pd_session::alloc_rpc_cap during Entrypoint object creation. The two calls go to Entrypoint::manage the Exit-handler for the Rpc_entrypoint and for the Signal_proxy_component for the Signal-API. To recreate those Native_capabilities at restore time, I have to use the same Ipc_server capability. How can this be done?
I also have some general questions about Genode capabilities in Fiasco.OC: In the Genode Foundations book, on page 37, there is a figure (figure 2) with an RPC object and its object identity. What is an object identity in Fiasco.OC? * How is it called there? * Where can I find it in the source files? * Does it comprise information about... * ...the owner of the RPC object? * ...which component has the data in memory? * ...where it can be found in the address space?
Kind regards, Denis
On 07.10.2016 11:34, Norman Feske wrote:
Hi Denis,
The target application could create its own service with a root and session Rpc_object and manage requests through an Entrypoint. Although the Entrypoint creates new Capabilities through the PD session which the Checkpoint/Restore intercepts (PD::alloc_rpc_cap). The Checkpoint/Restore application cannot associate the created Capability to a concrete Rpc_object which is created by the target application itself.
that is true. The monitoring component has no idea about the meaning of RPC objects created internally within the child.
But the child never uses such capabilities to talk to the outside world. If such a capability is created to provide a service to the outside world (e.g., a session capability), your monitoring component will actually get hold of it along with the information of its type. I.e., the child passes a root capability via the 'Parent::announce' RPC function to the monitoring component, or the monitoring component receives a session capability as a response of a 'Root::session' RPC call (which specifies the name of the session type as argument).
Those capabilities are - strictly speaking - not needed to make the child happy, but merely to enable someone else to use the child's service. However, there is also the case where the child uses RPCs in a component-local way. Even though the monitoring component does not need to know the meaning behind those capabilities, it needs to replicate the association of the component's internal RPC objects with the corresponding kernel capabilities.
To solve this problem I did not find any solutions which is transparent to the target application nor is possible without modifying the kernel. A non-transparent, but user-level solution would be to let the Checkpoint/Restore application implement the service of the target application. But this will impose rewriting existing Genode components, which I would avoid.
Perhaps someone in the Genode community has an idea, how I can get access to the target application's Rpc_objects created by its own service.
This is indeed a tricky problem. I see two possible approaches:
Because the monitoring component is in control of the child's PD session (and thereby the region map of the child's address space), it may peek and poke in the virtual memory of the child (e.g., it may may attach a portion of the child's address space as a managed dataspace to its own region map). In particular, it could inspect and manipulate the child-local meta data for the child's capability space where it keeps the association between RPC object identities and kcap selectors. This approach would require the monitor to interpret the child's internal data structures, similar to what a debugger does.
We may let the child pro-actively propagate information about its capability space to the outside so that the monitoring component can conveniently intercept this information. E.g. as a rough idea, we could add a 'Pd_session::cap_space_dataspace' RPC function where a component can request a dataspace capability for a memory buffer where it reports the layout information of its capability space. This could happen internally in the base library. So it would be transparent for the application code.
I think however that merely propagating information from the child may not be enough. You also may need a way to re-assign new RPC object identities to the capability space of the restored child.
Noux employs a mix of both approaches when forking a process. The parent capability is poked directly into the address space of the new process whereas all other capabilities are re-initialized locally in the child. Maybe you could find a middle ground where the child component reports just enough internal information (e.g., the pointer to its 'cap_map') to let the monitor effectively apply the first approach (peeking and poking)?
Btw, just as a side remark, this problem does not exist on the base-hw kernel where the RPC object identities are equal to the capability selectors.
Cheers Norman
Hello Denis,
let me start with the questions raised at the end of your posting:
I also have some general questions about Genode capabilities in Fiasco.OC: In the Genode Foundations book, on page 37, there is a figure (figure 2) with an RPC object and its object identity. What is an object identity in Fiasco.OC?
- How is it called there?
Note that the book is focused on NOVA and base-hw, which resemble the presented capability model quite closely. On other kernels like seL4 and Fiasco.OC, there is a slight mismatch between the kernel-provided capability mechanism and Genode's notion of capabilies. You can find a discussion of one important distinction at [1]. Hence, on these kernels, we need to "emulate" parts of Genode's capability model using the kernel features at hand.
[1] http://sel4.systems/pipermail/devel/2014-November/000112.html
On Fiasco.OC, an object identity consists of two parts:
1. An IPC gate bound to the entrypoint thread. The IPC gate is a Fiasco-internal kernel object. The user-level component refers to it via a kernel-capability selector, which is (like a file descriptor on Unix) a component-local number understood by the kernel. In Genode's code, we use the term "kcap" as an abbreviation of kernel capability selector. The kcap can be used as an argument for kernel operations, in particular as a destination for an IPC call.
Note that kcaps may refer to various kernel objects (like threads, tasks). But - a few exceptions notwithstanding - a Genode capability (as returned by 'Entrypoint::manage') refers to the kcap for an IPC gate.
2. A system-globally unique object ID, which is allocated by core's 'Pd_session::alloc_rpc_cap' operation. Unlike the kcap, the kernel has no idea what this ID is about. It is just a number. Within the Genode code, this number is called "badge" or "Rpc_obj_key". The badge value is used at the server side as a key for looking up the RPC object that belongs to an incoming RPC request.
Each Genode capability carries both parts. When a component inserts a new capability into its capability space, you can see both values as arguments to 'Capability_map::insert_map' (in base-foc/src/lib/base/cap_map.cc).
When a Genode capability is transferred as RPC argument (via 'copy_msgbuf_to_utcb' and 'extract_msg_from_utcb' in base-foc/src/lib/ipc.cc), you can see that the kcap part is passed as a 'L4_MAP_ITEM' whereas the badge is transferred as plain message word.
When a Genode capability is created for an RPC object ('Entrypoint::manage' -> 'PD_session::alloc_rpc_cap'), core imprints the new badge value into the new IPC gate. This way, whenever an IPC is sent to the IPC gate, the receiving thread (the server) receives the badge value of the invoked object directly from the kernel. This way, a misbehaving client cannot deliberately fake the badge when invoking an RPC object.
Given this background, I hope that my remark about base-hw in my previous email becomes more clear. On base-hw, we can simply use the value of a kernel capability selector as badge value. There is no need to system-globally unique ID values.
* ...the owner of the RPC object?
The kcap of a Genode capability refers to an IPC gate. The IPC gate is associated with a thread. The thread lives in a protection domain. The owner of an RPC object is therefore implicitly the PD that created the IPC gate (the caller of 'Entrypoint::manage') for the RPC object.
* ...which component has the data in memory?
Each component keeps track of its local capability space using some meta-data structures managed internally within the 'base' library. On Fiasco.OC, this data structure is called 'cap_map'. It maintains the association of 'kcap' values with their corresponding badges.
For the checkpointing/restarting, these data structure must be interpreted/updated from the monitoring component. This may be tricky because the cap_map is not designed to be easy to manipulate from the outside. I.e., it has an AVL tree with the badges as keys. By solely poking new badge values into the component's cap cap, the order of AVL nodes becomes corrupted.
It may be possible to simplify the cap-map implementation, removing the AVL tree. As far as I can see, the use cases for looking up a kcap by a badge no longer exists except for sanity checks within the implementation of the cap map itself. The method 'Capability_map::find' is actually unused.
* ...where it can be found in the address space?
The cap map is instantiated as local static variable of the function 'Genode::cap_map' in 'base-foc/src/lib/base/cap_map.cc'. Hence, it is located somewhere in the data segment of the binary. You'd need to somehow communicate the pointer value from the component to the monitor, e.g. by writing it to a dataspace shared between the two.
In particular, when the target component creates an Entrypoint object, then it creates a Native_capability (as Ipc_server) from a capability found in the utcb's thread control registers:
repos/base-foc/src/lib/base/ipc.cc:377
The Ipc_server capability is used in two calls to Pd_session::alloc_rpc_cap during Entrypoint object creation. The two calls go to Entrypoint::manage the Exit-handler for the Rpc_entrypoint and for the Signal_proxy_component for the Signal-API. To recreate those Native_capabilities at restore time, I have to use the same Ipc_server capability. How can this be done?
The Ipc_server capability is the kernel capability selector for the entrypoint thread. This selector is used to associate the new IPC gate with this particular thread. So IPCs sent to the IPC gate will arrive at the entrypoint thread. It is not a regular Genode capability because it refers to a thread instead of an IPC gate. Fortunately, the monitor is able to get hold of this capability because the component requests it by calling 'Cpu_thread::state' when creating a new thread (base-foc/src/lib/base/thread_start.cc).
Yes, the topic is really complicated. Is the confusion perfect now? ;-)
Cheers Norman
Hello Norman,
you are right, it is quite complicated, but I think I understand the capability concept in Genode with Fiasco.OC. Let me recap it:
I created a simple figure [1] to illustrate my thoughts. A component has a capability map and a kernel-intern capability space. Each managed RPC object has a capability which points to a capability map slot that stores a system-global identifier called badge. The capability space slot can be computed through the capability map slot. The corresponding capability map slot points to the object identity which is an IPC gate.
[1] https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b78f5...
In order to restore a component on another ECU, the checkpointed variables representing capabilities (entries in memory, e.g. stack) have to be made valid. Therefore, I have to restore the IPC gate, the capability space slot pointing to this IPC gate, and allocate a new badge, because it is valid only in one system and the component is migrated to another system. Also, I have to restore the capability map slot to point to the new badge and restore the RPC object.
In the following I assume that the RPC objects of the target component are created by the Checkpoint/Restore component (i.e. it intercepts the session requests and provides own sessions at child creation). The other case regarding local RPC objects of the target component will be discussed later, if I hopefully have the time:
By virtualizing the session RPC objects and the normal RPC objects, I can checkpoint the state of them. Thus, I can recreate an RPC object. When I do that the RPC object has a new capability (local to the Checkpoint/Restore component) and a valid badge. Implicitly a valid IPC gate is also recreated. Thus, the target component has to know this capability inside its protection domain. Therefore, the capability space/map slot has to point to the IPC gate or to the new badge, respectively. * The capability space slot is recreated by issuing l4_task_map to map a capability from core to the target child. This is done by extending Foc_native_pd interface (see in an earlier mail from Norman). * The capability map slot is recreated by Capability_map::insert(new_badge, old_kcap). Thus, I have to checkpoint the kcap by Capability_map::find(new_badge)->kcap().
Now I am missing the pointer to target component's internal capability map. I already have all dataspace capabilities which are attached to the target's address space. With the pointer I can cast it to a Capability_map* and use its methods to manipulate the Avl-tree. Please correct me if I am wrong.
Norman, you proposed a rough idea of how to obtain a dataspace capability of the capability map through the PD_session in one of your previous mails:
On 07.10.2016 09:48, Norman Feske wrote:
- We may let the child pro-actively propagate information about its
capability space to the outside so that the monitoring component can conveniently intercept this information. E.g. as a rough idea, we could add a 'Pd_session::cap_space_dataspace' RPC function where a component can request a dataspace capability for a memory buffer where it reports the layout information of its capability space. This could happen internally in the base library. So it would be transparent for the application code.
Can you or of course anyone else elaborate on how it "could happen internally in the base library"? Does core know the locations of capability maps of other components?
Kind regards, Denis
PS: If my thoughts contain a mistake, please feel free to correct me. It would help me a lot :)
Hi Denis,
I created a simple figure [1] to illustrate my thoughts. [...]
[1] https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b78f5...
the figure is good except for the detail that the capability map should appear within the protection domain. It is a component-local data structure.
In order to restore a component on another ECU, the checkpointed variables representing capabilities (entries in memory, e.g. stack) have to be made valid. Therefore, I have to restore the IPC gate, the capability space slot pointing to this IPC gate, and allocate a new badge, because it is valid only in one system and the component is migrated to another system. Also, I have to restore the capability map slot to point to the new badge and restore the RPC object.
Exactly.
In the following I assume that the RPC objects of the target component are created by the Checkpoint/Restore component (i.e. it intercepts the session requests and provides own sessions at child creation). The other case regarding local RPC objects of the target component will be discussed later, if I hopefully have the time:
By virtualizing the session RPC objects and the normal RPC objects, I can checkpoint the state of them. Thus, I can recreate an RPC object.
I do not completely understand what you mean by "virtualizing RPC objects". To recap the terminology, an RPC object is a data structure that is local to the component. When restoring the virtual address space of the component, this data structure gets re-created automatically. However the data structure contains the capability (it is an 'Object_pool::Entry') as a member. The "local name" aka "badge" of this capability is used as a key to look up the invoked object of an incoming RPC request. This capability originated from 'Pd_session::alloc_rpc_cap'.
When I do that the RPC object has a new capability (local to the Checkpoint/Restore component) and a valid badge. Implicitly a valid IPC gate is also recreated. Thus, the target component has to know this capability inside its protection domain. Therefore, the capability space/map slot has to point to the IPC gate or to the new badge, respectively.
- The capability space slot is recreated by issuing l4_task_map to map a
capability from core to the target child. This is done by extending Foc_native_pd interface (see in an earlier mail from Norman).
- The capability map slot is recreated by
Capability_map::insert(new_badge, old_kcap). Thus, I have to checkpoint the kcap by Capability_map::find(new_badge)->kcap().
Yes. The problem is that the latter operation is a component-local manipulation of its cap map data structure. The monitor cannot call the function in the target's address space directly.
Now I am missing the pointer to target component's internal capability map.
I already have all dataspace capabilities which are attached to the target's address space. With the pointer I can cast it to a Capability_map* and use its methods to manipulate the Avl-tree. Please correct me if I am wrong.
This won't work that easily. The AVL tree contains pointers that point to some place within the target's address space. A function call would ultimately de-reference those pointers. If you attach (a part of) the target's address space within the monitor's address space, the pointers would generally not be valid in the monitor's address space. Aside from that, I do not think that it would be a good idea to let the monitor de-reference pointer values originating from the (untrusted) target.
The AVL tree must be manipulated without relying on the original code. To sidestep this issue, I proposed to simplify the data structure, e.g., by replacing the AVL tree by a list. Then, the monitor just needs to write new badge values into the targets memory but won't need to manipulate the target's data structures. This applies to the AVL tree used in the cap map and the AVL tree used by the object pool (which also uses the badge as key).
Granted, by using a plain list, the lookup becomes slower. But you remove a show stopper for your actual research goal. Once, the checkpointing works, we can still try to solve the AVL tree problem.
Norman, you proposed a rough idea of how to obtain a dataspace capability of the capability map through the PD_session in one of your previous mails:
On 07.10.2016 09:48, Norman Feske wrote:
- We may let the child pro-actively propagate information about its
capability space to the outside so that the monitoring component can conveniently intercept this information. E.g. as a rough idea, we could add a 'Pd_session::cap_space_dataspace' RPC function where a component can request a dataspace capability for a memory buffer where it reports the layout information of its capability space. This could happen internally in the base library. So it would be transparent for the application code.
Can you or of course anyone else elaborate on how it "could happen internally in the base library"? Does core know the locations of capability maps of other components?
No. But my suggestion of the PD-session extension was not concerned with capability maps at all. The proposed mechanism would only operate on the target's capability space. The capability map must by adjusted by the monitor by manipulating the target's memory. Both pieces of the puzzle are needed: the population of the target's cap space (via an interface provided by core), and the update of the badges in the target's cap map.
By "could happen internally in the base library", I meant that the proactive "leaking" of interesting information (like the base address of the cap map, or the association between kcap selectors and badges) from the target to the monitor could be hidden in the base library (which is locally linked to each component). Because it would not be visible at the API level, it is transparent to component developers.
Cheers Norman
Hello Norman,
thank you for the confirmation of my thoughts and for the correction/clarification of my misunderstandings.
This won't work that easily. The AVL tree contains pointers that point to some place within the target's address space. A function call would ultimately de-reference those pointers. If you attach (a part of) the target's address space within the monitor's address space, the pointers would generally not be valid in the monitor's address space. Aside from that, I do not think that it would be a good idea to let the monitor de-reference pointer values originating from the (untrusted) target.
The AVL tree must be manipulated without relying on the original code. To sidestep this issue, I proposed to simplify the data structure, e.g., by replacing the AVL tree by a list. Then, the monitor just needs to write new badge values into the targets memory but won't need to manipulate the target's data structures. This applies to the AVL tree used in the cap map and the AVL tree used by the object pool (which also uses the badge as key).
Thank you for the hint with the pointers inside an AVL tree. I did not thought my concept to the end and missed the fact, that the pointers are only valid inside target's address space. Thus, I will simplify the AVL tree for the cap map and the AVL tree of the object pool to lists. And just change the values of the list elements. I will still need to use the pointer of List::Element::_next, but I will have to convert the pointer to point to a dataspace of the target.
No. But my suggestion of the PD-session extension was not concerned with capability maps at all. The proposed mechanism would only operate on the target's capability space. The capability map must by adjusted by the monitor by manipulating the target's memory. Both pieces of the puzzle are needed: the population of the target's cap space (via an interface provided by core), and the update of the badges in the target's cap map.
Now I understand. You mean, I can propagate information from the target component through its capability space. For a simple example, I could create an (unbound) IPC gate, store a pointer into the label, and use a capability space slot to reference it. I could take one from the area controlled by core, which will (probably) not be overriden: I could take the last one which is 0x1ff.
By "could happen internally in the base library", I meant that the proactive "leaking" of interesting information (like the base address of the cap map, or the association between kcap selectors and badges) from the target to the monitor could be hidden in the base library (which is locally linked to each component). Because it would not be visible at the API level, it is transparent to component developers.
I could do the creation of the IPC gate and the assignment in the startup code of the application. I found a file, where a function initializes the main thread:
base_foc/src/lib/base/thread_bootstrap.cc:30
It is called prepare_init_main_thread. Is it save to use this function, or is there a better one?
What do you think about my approach? Will it work theoretically with the assumption that the last capability slot will not be used by any Genode/Fiasco.OC library and by no future library/component.
Kind regards, Denis
Hi Denis,
Now I understand. You mean, I can propagate information from the target component through its capability space. For a simple example, I could create an (unbound) IPC gate, store a pointer into the label, and use a capability space slot to reference it. I could take one from the area controlled by core, which will (probably) not be overriden: I could take the last one which is 0x1ff.
I had a pretty simple idea in mind: The PD session could offer an RPC function like this:
Dataspace_capability cap_space_info();
A component can call this function to obtain a RAM dataspace of a predefined size from the PD service.
The component startup code calls this function. If it returns a valid dataspace, it attaches the dataspace to its address space. So now, we have a shared memory block that can be written to by the target and inspected by the monitor (because it is the PD service that handed out the 'cap_space_info' dataspace).
Once the shared memory is in place, the target can record information that might be of interest for the monitor to this dataspace. E.g., it may maintain an array of the following struct:
struct Cap_info { unsigned kcap; addr_t badge_ptr; /* pointer to badge value within cap map */ };
Thereby it tells the monitor all the information that is needed to extract/update the information of the cap map.
This array could be updated initially (when the dataspace is mapped, the cap map already contains a bunch of capabilities), and whenever a capability is inserted/removed into/from the cap map.
Note that since both the target and monitor access the cap_space_info concurrently, you'd need to add some kind of synchronization between both, e.g., by maintaining some bits within the cap-space-info dataspace that act as some kind of spinlock.
Core's version of 'Pd_session::cap_space_info' would return an invalid capability. In this case, the component skips the reporting.
I could do the creation of the IPC gate and the assignment in the startup code of the application. I found a file, where a function initializes the main thread:
base_foc/src/lib/base/thread_bootstrap.cc:30
It is called prepare_init_main_thread. Is it save to use this function, or is there a better one?
Its good as it is specific to Fiasco.OC.
What do you think about my approach? Will it work theoretically with the assumption that the last capability slot will not be used by any Genode/Fiasco.OC library and by no future library/component.
I think that your approach could work well but it adds one IPC call for each cap-map operation. The shared-memory idea as drafted above does not impose this overhead. That said, I would suggest to implement whichever way you find easier. ;-)
Cheers Norman