Dear Genode community,
thanks to you [1], I could implement my Checkpoint/Restore mechanism on Genode/Fiasco.OC. I also added the incremental checkpoint optimization, to stored only changed memory regions compared to the last checkpoint (although this is not working reliably due to a Fiasco.OC bug, which Stefan Kalkowski found for me [2]). I also managed to checkpoint the capability map and restore it with new badges and insert missing capabilities into the capability space of Fiasco.OC.
[1] https://sourceforge.net/p/genode/mailman/message/35322604/ [2] https://sourceforge.net/p/genode/mailman/message/35377269/
My problem is, although I restore all RPC objects, especially the instruction and stack pointer of the main thread, and the capability map and space, the target component just starts its execution from the beginning of its Component::construct function.
My approach: For the restore phase, I use Genode's native bootstrap mechanism (i.e. I create a Genode::Child object) until it requests a LOG session from my Checkpoint/Restore component. I force a LOG session request in ::Constructor_component::construct() just before "Genode::call_component_construct(env);" in
https://github.com/genodelabs/genode/blob/16.08/repos/base/src/lib/base/entr...
Until the session request several RAM dataspaces are created, among other RPC objects, and attached to the address space. In my restore mechanism I identify the RPC objects, which were created by the bootstrap/startup mechanism, and only restore their state. After that point, I recreate and restore the state of all other RPC objects which are known by the child component. At last, I restore the capability map and space.
During that process the mandatory CPU threads are identified (three of them: "ep", "signal_handler", and "childs_rom_name") and restored to their checkpointed state, especially the ip and sp registers. I did that through the use of Cpu_thread::state(Thread_state), but without luck. Also, although I know that the CPU threads were already started, I tried to call Cpu_thread::start(ip, sp), but without success.
After the restoration which happens entirely during the LOG session request of the child, my component returns with a valid session object to the child. Now the child should continue the work from the point where it was checkpointed, but it continues its execution right after the LOG session request, ignoring the setting of the instruction pointer.
The source code of the restore CPU thread state is found in [3]. I used run script [4] for the tests.
[3] https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/660a8... [4] https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/660a8...
Curiously, the child runs just as nothing happened, although its stack area was also manipulated.
Perhaps my approach by reusing the bootstrap/startup mechanism is not destined to work, or maybe I have missed some important points in this mechanism. If so, please point me to the problem. I would also consider other restoration approaches, for example, by recreating all RPC objects manually and insert them into the capability map/space. What are your thoughts on my approach? Can it work? Does something else work better?
Kind regards, Denis