Restoring child with checkpointed state
Stefan Kalkowski
stefan.kalkowski at ...1...
Wed Mar 29 14:05:22 CEST 2017
Hello Dennis,
On 03/27/2017 04:14 PM, Denis Huber wrote:
> Dear Genode community,
>
> Preliminary: We implemented a Checkpoint/Restore mechanism on basis of
> Genode/Fiasco.OC (Thanks to the great help of you all). We store the
> state of the target component by monitoring its RPC function calls which
> go through the parent component (= our Checkpoint/Restore component).
> The capability space is indirectly checkpointed through the capability map.
> The restoring of the state of the target is done by restoring the RPC
> objects used by the target component (e.g. PD session, dataspaces,
> region maps, etc.). The capabilities of the restored objects have to be
> also restored in the capability space (kernel) and in the capability map
> (userspace).
>
> For restoring the target component Norman suggested the usage of the
> Genode::Child constructor with an invalid ROM dataspace capability which
> does not trigger the bootstrap mechanism. Thus, we have the full control
> of inserting the capabilities of the restored RPC objects into the
> capability space/map.
>
> Our problem is the following: We restore the RPC objects and insert them
> into the capability map and then in the capability space. From the
> kernel point of view these capabilities are all "IPC Gates".
> Unfortunately, there was also an IRQ kernel object created by the
> bootstrap mechanism. The following table shows the kernel debugger
> output of the capability space of the freshly bootstraped target component:
>
> 000204 :0016e* Gate 0015f* Gate 00158* Gate 00152* Gate
> 000208 :00154* Gate 0017e* Gate 0017f* Gate 00179* Gate
> 00020c :00180* Gate 00188* Gate -- --
> 000210 : -- -- 0018a* Gate 0018c* Gate
> 000214 :0018e* Gate 00196* Gate 00145* Gate 00144* IRQ
> 000218 :00198* Gate -- -- --
> 00021c : -- 0019c* Gate -- --
>
> At address 000217 you can see the IRQ kernel object. What does this
> object do, how can we store/monitor it, and how can it be restored?
> Where can we find the source code which creates this object in Genode's
> bootstrap code?
The IRQ kernel object you refer to is used by the "signal_handler"
thread to block for signals of core's corresponding service. It is a
base-foc specific internal core RPC object[1] that is used by the signal
handler[2] and the related capability gets returned by the call to
'alloc_signal_source()' provided by the PD session[3].
I have to admit, I did not follow your current implementation approach
in depth. Thereby, I do not know how to exactly handle this specific
signal hander thread and its semaphore-like IRQ object, but maybe the
references already help you further.
Regards
Stefan
[1] repos/base-foc/src/core/signal_source_component.cc
[2] repos/base-foc/src/lib/base/signal_source_client.cc
[3] repos/base/src/core/include/pd_session_component.h
>
>
> Best regards,
> Denis
>
> On 11.12.2016 13:01, Denis Huber wrote:
>> Hello Norman,
>>
>>> What you observe here is the ELF loading of the child's binary. As part
>>> of the 'Child' object, the so-called '_process' member is constructed.
>>> You can find the corresponding code at
>>> 'base/src/lib/base/child_process.cc'. The code parses the ELF executable
>>> and loads the program segments, specifically the read-only text segment
>>> and the read-writable data/bss segment. For the latter, a RAM dataspace
>>> is allocated and filled with the content of the ELF binary's data. In
>>> your case, when resuming, this procedure is wrong. After all, you want
>>> to supply the checkpointed data to the new child, not the initial data
>>> provided by the ELF binary.
>>>
>>> Fortunately, I encountered the same problem when implementing fork for
>>> noux. I solved it by letting the 'Child_process' constructor accept an
>>> invalid dataspace capability as ELF argument. This has two effects:
>>> First, the ELF loading is skipped (obviously - there is no ELF to load).
>>> And second the creation of the initial thread is skipped as well.
>>>
>>> In short, by supplying an invalid dataspace capability as binary for the
>>> new child, you avoid all those unwanted operations. The new child will
>>> not start at 'Component::construct'. You will have to manually create
>>> and start the threads of the new child via the PD and CPU session
>>> interfaces.
>>
>> Thank you for the hint. I will try out your approach
>>
>>> The approach looks good. I presume that you encounter base-foc-specific
>>> peculiarities of the thread-creation procedure. I would try to follow
>>> the code in 'base-foc/src/core/platform_thread.cc' to see what the
>>> interaction of core with the kernel looks like. The order of operations
>>> might be important.
>>>
>>> One remaining problem may be that - even though you may by able the
>>> restore most part of the thread state - the kernel-internal state cannot
>>> be captured. E.g., think of a thread that was blocking in the kernel via
>>> 'l4_ipc_reply_and_wait' when checkpointed. When resumed, the new thread
>>> can naturally not be in this blocking state because the kernel's state
>>> is not part of the checkpointed state. The new thread would possibly
>>> start its execution at the instruction pointer of the syscall and issue
>>> system call again, but I am not sure what really happens in practice.
>>
>> Is there a way to avoid this situation? Can I postpone the checkpoint by
>> letting the entrypoint thread finish the intercepted RPC function call,
>> then increment the ip of child's thread to the next command?
>>
>>> I think that you don't need the LOG-session quirk if you follow my
>>> suggestion to skip the ELF loading for the restored component
>>> altogether. Could you give it a try?
>>
>> You are right, the LOG-session quirk seems a bit clumsy. I like your
>> idea of skipping the ELF loading and automated creation of CPU threads
>> more, because it gives me the control to create and start the threads
>> from the stored ip and sp.
>>
>>
>> Best regards,
>> Denis
>>
>> ------------------------------------------------------------------------------
>> Developer Access Program for Intel Xeon Phi Processors
>> Access to Intel Xeon Phi processor-based developer platforms.
>> With one year of Intel Parallel Studio XE.
>> Training and support from Colfax.
>> Order your platform today.http://sdm.link/xeonphi
>> _______________________________________________
>> genode-main mailing list
>> genode-main at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> genode-main mailing list
> genode-main at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/genode-main
>
--
Stefan Kalkowski
Genode Labs
https://github.com/skalk ยท http://genode.org/
More information about the users
mailing list