Restoring child with checkpointed state

David Werner wernerd at ...389...
Wed Jun 21 12:12:57 CEST 2017


Hi everyone,

As I'm stuck with this problem I would appreciate any kind of advice.

Best Regards,
David

Am 07.06.2017 um 15:13 schrieb David Werner:
> Hi everyone,
>
> after Denis Huber left the project, I am in charge of making our 
> checkpoint/restore component work.
> Therefore i would like to ask some more questions on the IRQ kernel 
> object.
>
>
> 1. When is the IRQ object created? Does every component have an own 
> IRQ object?
>
> I tried to figure out when the IRQ object is mapped into the object 
> space of a component on its startup. Therefore I took a look at the 
> code in [repos/base-foc/src/core/signal_source_component.cc]. The IRQ 
> object appears in the object space after the "_sem = 
> <Rpc_request_semaphore>();" statement in the constructor.
>
> As far as I could  follow the implementation the "request_semaphore" 
> RPC call is answered by the "Signal_source_rpc_object" in 
> [base-foc/src/include/signal_source/rpc_object.h] which 
> returns/delegates the native capability "_blocking_semaphore" which is 
> an attribute of the "Signal_source_rpc_object". It seems to me that 
> the IRQ object already exists at this point and is only delegated to 
> the component.
>
> But when is the IRQ object created and by whom? Is it created when a 
> new PD session is created?
>
>
>
> 2. Does the IRQ object carry any information? Do I need to checkpoint 
> this information in order to be able to recreate the object properly 
> during a restore process? Is the IRQ object created automatically (and 
> i only have to make sure that the object is getting mapped into the 
> object space of the target) or do i have to create it manually?
>
> In our current implementation of the restore process we restore a 
> component by recreating its sessions to core services (+timer) with 
> the help of information we gathered using a custom runtime 
> environment. After the sessions are restored we place them in the 
> object space at the correct position. Will I also have to somehow 
> store information about the IRQ object? Or is it just some object that 
> needs to exist?
>
>
> Kind Regards,
> David
>
>
> Am 29.03.2017 um 14:05 schrieb Stefan Kalkowski:
>> Hello Dennis,
>>
>> On 03/27/2017 04:14 PM, Denis Huber wrote:
>>> Dear Genode community,
>>>
>>> Preliminary: We implemented a Checkpoint/Restore mechanism on basis of
>>> Genode/Fiasco.OC (Thanks to the great help of you all). We store the
>>> state of the target component by monitoring its RPC function calls 
>>> which
>>> go through the parent component (= our Checkpoint/Restore component).
>>> The capability space is indirectly checkpointed through the 
>>> capability map.
>>> The restoring of the state of the target is done by restoring the RPC
>>> objects used by the target component (e.g. PD session, dataspaces,
>>> region maps, etc.). The capabilities of the restored objects have to be
>>> also restored in the capability space (kernel) and in the capability 
>>> map
>>> (userspace).
>>>
>>> For restoring the target component Norman suggested the usage of the
>>> Genode::Child constructor with an invalid ROM dataspace capability 
>>> which
>>> does not trigger the bootstrap mechanism. Thus, we have the full 
>>> control
>>> of inserting the capabilities of the restored RPC objects into the
>>> capability space/map.
>>>
>>> Our problem is the following: We restore the RPC objects and insert 
>>> them
>>> into the capability map and then in the capability space. From the
>>> kernel point of view these capabilities are all "IPC Gates".
>>> Unfortunately, there was also an IRQ kernel object created by the
>>> bootstrap mechanism. The following table shows the kernel debugger
>>> output of the capability space of the freshly bootstraped target 
>>> component:
>>>
>>> 000204 :0016e* Gate   0015f* Gate   00158* Gate   00152* Gate
>>> 000208 :00154* Gate   0017e* Gate   0017f* Gate   00179* Gate
>>> 00020c :00180* Gate   00188* Gate          --            --
>>> 000210 :       --            --     0018a* Gate   0018c* Gate
>>> 000214 :0018e* Gate   00196* Gate   00145* Gate   00144* IRQ
>>> 000218 :00198* Gate          --            --            --
>>> 00021c :       --     0019c* Gate          --            --
>>>
>>> At address 000217 you can see the IRQ kernel object. What does this
>>> object do, how can we store/monitor it, and how can it be restored?
>>> Where can we find the source code which creates this object in Genode's
>>> bootstrap code?
>> The IRQ kernel object you refer to is used by the "signal_handler"
>> thread to block for signals of core's corresponding service. It is a
>> base-foc specific internal core RPC object[1] that is used by the signal
>> handler[2] and the related capability gets returned by the call to
>> 'alloc_signal_source()' provided by the PD session[3].
>>
>> I have to admit, I did not follow your current implementation approach
>> in depth. Thereby, I do not know how to exactly handle this specific
>> signal hander thread and its semaphore-like IRQ object, but maybe the
>> references already help you further.
>>
>> Regards
>> Stefan
>>
>> [1] repos/base-foc/src/core/signal_source_component.cc
>> [2] repos/base-foc/src/lib/base/signal_source_client.cc
>> [3] repos/base/src/core/include/pd_session_component.h
>>>
>>> Best regards,
>>> Denis
>>>
>>> On 11.12.2016 13:01, Denis Huber wrote:
>>>> Hello Norman,
>>>>
>>>>> What you observe here is the ELF loading of the child's binary. As 
>>>>> part
>>>>> of the 'Child' object, the so-called '_process' member is 
>>>>> constructed.
>>>>> You can find the corresponding code at
>>>>> 'base/src/lib/base/child_process.cc'. The code parses the ELF 
>>>>> executable
>>>>> and loads the program segments, specifically the read-only text 
>>>>> segment
>>>>> and the read-writable data/bss segment. For the latter, a RAM 
>>>>> dataspace
>>>>> is allocated and filled with the content of the ELF binary's data. In
>>>>> your case, when resuming, this procedure is wrong. After all, you 
>>>>> want
>>>>> to supply the checkpointed data to the new child, not the initial 
>>>>> data
>>>>> provided by the ELF binary.
>>>>>
>>>>> Fortunately, I encountered the same problem when implementing fork 
>>>>> for
>>>>> noux. I solved it by letting the 'Child_process' constructor 
>>>>> accept an
>>>>> invalid dataspace capability as ELF argument. This has two effects:
>>>>> First, the ELF loading is skipped (obviously - there is no ELF to 
>>>>> load).
>>>>> And second the creation of the initial thread is skipped as well.
>>>>>
>>>>> In short, by supplying an invalid dataspace capability as binary 
>>>>> for the
>>>>> new child, you avoid all those unwanted operations. The new child 
>>>>> will
>>>>> not start at 'Component::construct'. You will have to manually create
>>>>> and start the threads of the new child via the PD and CPU session
>>>>> interfaces.
>>>> Thank you for the hint. I will try out your approach
>>>>
>>>>> The approach looks good. I presume that you encounter 
>>>>> base-foc-specific
>>>>> peculiarities of the thread-creation procedure. I would try to follow
>>>>> the code in 'base-foc/src/core/platform_thread.cc' to see what the
>>>>> interaction of core with the kernel looks like. The order of 
>>>>> operations
>>>>> might be important.
>>>>>
>>>>> One remaining problem may be that - even though you may by able the
>>>>> restore most part of the thread state - the kernel-internal state 
>>>>> cannot
>>>>> be captured. E.g., think of a thread that was blocking in the 
>>>>> kernel via
>>>>> 'l4_ipc_reply_and_wait' when checkpointed. When resumed, the new 
>>>>> thread
>>>>> can naturally not be in this blocking state because the kernel's 
>>>>> state
>>>>> is not part of the checkpointed state. The new thread would possibly
>>>>> start its execution at the instruction pointer of the syscall and 
>>>>> issue
>>>>> system call again, but I am not sure what really happens in practice.
>>>> Is there a way to avoid this situation? Can I postpone the 
>>>> checkpoint by
>>>> letting the entrypoint thread finish the intercepted RPC function 
>>>> call,
>>>> then increment the ip of child's thread to the next command?
>>>>
>>>>> I think that you don't need the LOG-session quirk if you follow my
>>>>> suggestion to skip the ELF loading for the restored component
>>>>> altogether. Could you give it a try?
>>>> You are right, the LOG-session quirk seems a bit clumsy. I like your
>>>> idea of skipping the ELF loading and automated creation of CPU threads
>>>> more, because it gives me the control to create and start the threads
>>>> from the stored ip and sp.
>>>>
>>>>
>>>> Best regards,
>>>> Denis
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> Developer Access Program for Intel Xeon Phi Processors
>>>> Access to Intel Xeon Phi processor-based developer platforms.
>>>> With one year of Intel Parallel Studio XE.
>>>> Training and support from Colfax.
>>>> Order your platform today.http://sdm.link/xeonphi
>>>> _______________________________________________
>>>> genode-main mailing list
>>>> genode-main at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>>>
>>> ------------------------------------------------------------------------------ 
>>>
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> genode-main mailing list
>>> genode-main at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>>
>
>
> ------------------------------------------------------------------------------ 
>
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> genode-main mailing list
> genode-main at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/genode-main
>





More information about the users mailing list