Re: Restoring child with checkpointed state

11 Dec 2016


      Hello Norman,
...
What you observe here is the ELF loading of the child's binary. As part
of the 'Child' object, the so-called '_process' member is constructed.
You can find the corresponding code at
'base/src/lib/base/child_process.cc'. The code parses the ELF executable
and loads the program segments, specifically the read-only text segment
and the read-writable data/bss segment. For the latter, a RAM dataspace
is allocated and filled with the content of the ELF binary's data. In
your case, when resuming, this procedure is wrong. After all, you want
to supply the checkpointed data to the new child, not the initial data
provided by the ELF binary.
Fortunately, I encountered the same problem when implementing fork for
noux. I solved it by letting the 'Child_process' constructor accept an
invalid dataspace capability as ELF argument. This has two effects:
First, the ELF loading is skipped (obviously - there is no ELF to load).
And second the creation of the initial thread is skipped as well.
In short, by supplying an invalid dataspace capability as binary for the
new child, you avoid all those unwanted operations. The new child will
not start at 'Component::construct'. You will have to manually create
and start the threads of the new child via the PD and CPU session
interfaces.
Thank you for the hint. I will try out your approach
...
The approach looks good. I presume that you encounter base-foc-specific
peculiarities of the thread-creation procedure. I would try to follow
the code in 'base-foc/src/core/platform_thread.cc' to see what the
interaction of core with the kernel looks like. The order of operations
might be important.
One remaining problem may be that - even though you may by able the
restore most part of the thread state - the kernel-internal state cannot
be captured. E.g., think of a thread that was blocking in the kernel via
'l4_ipc_reply_and_wait' when checkpointed. When resumed, the new thread
can naturally not be in this blocking state because the kernel's state
is not part of the checkpointed state. The new thread would possibly
start its execution at the instruction pointer of the syscall and issue
system call again, but I am not sure what really happens in practice.
Is there a way to avoid this situation? Can I postpone the checkpoint by 
letting the entrypoint thread finish the intercepted RPC function call, 
then increment the ip of child's thread to the next command?
...
I think that you don't need the LOG-session quirk if you follow my
suggestion to skip the ELF loading for the restored component
altogether. Could you give it a try?
You are right, the LOG-session quirk seems a bit clumsy. I like your 
idea of skipping the ELF loading and automated creation of CPU threads 
more, because it gives me the control to create and start the threads 
from the stored ip and sp.
Best regards,
Denis

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: Restoring child with checkpointed state