Restoring child with checkpointed state

Denis Huber huber.denis at ...435...
Sat Dec 3 11:51:38 CET 2016


Dear Genode community,

thanks to you [1], I could implement my Checkpoint/Restore mechanism on 
Genode/Fiasco.OC. I also added the incremental checkpoint optimization, 
to stored only changed memory regions compared to the last checkpoint 
(although this is not working reliably due to a Fiasco.OC bug, which 
Stefan Kalkowski found for me [2]). I also managed to checkpoint the 
capability map and restore it with new badges and insert missing 
capabilities into the capability space of Fiasco.OC.

[1] https://sourceforge.net/p/genode/mailman/message/35322604/
[2] https://sourceforge.net/p/genode/mailman/message/35377269/

My problem is, although I restore all RPC objects, especially the 
instruction and stack pointer of the main thread, and the capability map 
and space, the target component just starts its execution from the 
beginning of its Component::construct function.

My approach:
For the restore phase, I use Genode's native bootstrap mechanism (i.e. I 
create a Genode::Child object) until it requests a LOG session from my 
Checkpoint/Restore component. I force a LOG session request in 
::Constructor_component::construct() just before 
"Genode::call_component_construct(env);" in

https://github.com/genodelabs/genode/blob/16.08/repos/base/src/lib/base/entrypoint.cc#L154

Until the session request several RAM dataspaces are created, among 
other RPC objects, and attached to the address space. In my restore 
mechanism I identify the RPC objects, which were created by the 
bootstrap/startup mechanism, and only restore their state. After that 
point, I recreate and restore the state of all other RPC objects which 
are known by the child component. At last, I restore the capability map 
and space.

During that process the mandatory CPU threads are identified (three of 
them: "ep", "signal_handler", and "childs_rom_name") and restored to 
their checkpointed state, especially the ip and sp registers. I did that 
through the use of Cpu_thread::state(Thread_state), but without luck. 
Also, although I know that the CPU threads were already started, I tried 
to call Cpu_thread::start(ip, sp), but without success.

After the restoration which happens entirely during the LOG session 
request of the child, my component returns with a valid session object 
to the child. Now the child should continue the work from the point 
where it was checkpointed, but it continues its execution right after 
the LOG session request, ignoring the setting of the instruction pointer.

The source code of the restore CPU thread state is found in [3]. I used 
run script [4] for the tests.

[3] 
https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/660a865084a4fe8524a0ccacc4bfb97f728482c9/src/rtcr/restorer.cc#L791
[4] 
https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/660a865084a4fe8524a0ccacc4bfb97f728482c9/run/rtcr_restore_child.run

Curiously, the child runs just as nothing happened, although its stack 
area was also manipulated.

Perhaps my approach by reusing the bootstrap/startup mechanism is not 
destined to work, or maybe I have missed some important points in this 
mechanism. If so, please point me to the problem.
I would also consider other restoration approaches, for example, by 
recreating all RPC objects manually and insert them into the capability 
map/space.
What are your thoughts on my approach? Can it work? Does something else 
work better?


Kind regards,
Denis




More information about the users mailing list