Page faults in managed dataspaces

Tue Sep 27 10:48:39 CEST 2016

Again, thank you Stefan, you are a big help! :)

The test program run successfully. Is there a date, when Genode's foc 
version will be upgraded?

Kind regards,
Denis

On 27.09.2016 09:17, Stefan Kalkowski wrote:
> Hi Dennis,
>
> On 09/26/2016 04:48 PM, Denis Huber wrote:
>> Hello Stefan,
>>
>> thank you for your help and finding the problem :)
>>
>> Can you tell me, how I can obtain your unofficial upgrade of foc and how
>> I can replace Genode's standard version with it?
>
> But be warned: it is unofficial, because I started to upgrade but
> stopped at some point due to timing constraints. That means certain
> problems we already fixed in the older version might still exist in the
> upgrade. Moreover, it is almost completely untested. Having said this,
> you can find it in my repository, it is the branch called foc_update.
> I've rebased it to the current master branch of Genode.
>
> Regards
> Stefan
>
>>
>>
>> Kind regards,
>> Denis
>>
>> On 26.09.2016 15:15, Stefan Kalkowski wrote:
>>> Hi Dennis,
>>>
>>> I further examined the issue. First, I found out that is is specific to
>>> Fiasco.OC. If you use another kernel, e.g., Nova, with the same test, it
>>> succeeds. So I instrumented the core component to always enter
>>> Fiasco.OC's kernel debugger when core unmapped the corresponding managed
>>> dataspace. When looking at the page-tables I could see that the mapping
>>> was successfully deleted. After that I enabled all kind of loggings
>>> related to page-faults and mapping operations. Lo and behold, after
>>> continuing and seeing that the "target" thread continued, I re-entered
>>> the kernel debugger and realized that the page-table entry reappeared
>>> although the kernel did not list any activity regarding page-faults and
>>> mappings. To me this is a clear kernel bug.
>>>
>>> I've tried out my unofficial upgrade to revision r67 of the Fiasco.OC
>>> kernel, and with that version it seemed to work correctly (I just tested
>>> some rounds).
>>>
>>> I fear the currently supported version of Fiasco.OC is buggy with
>>> respect to the unmap call, at least the way Genode has to use it.
>>>
>>> Regards
>>> Stefan
>>>
>>> On 09/26/2016 11:13 AM, Stefan Kalkowski wrote:
>>>> Hi Dennis,
>>>>
>>>> I've looked into your code, and what struck me first was that you use
>>>> two threads in your server, which share data in between
>>>> (Resource::Client_resources) without synchronization.
>>>>
>>>> I've rewritten your example server to only use one thread in a
>>>> state-machine like fashion, have a look here:
>>>>
>>>>
>>>> https://github.com/skalk/genode-CheckpointRestore-SharedMemory/commit/d9732dcab331cecdfd4fcc5c8948d9ca23d95e84
>>>>
>>>> This way it is thread-safe, simpler (less code), and if you are adapted
>>>> to it, it becomes even easier to understand.
>>>>
>>>> Nevertheless, although the possible synchronization problems are
>>>> eliminated by design, your described problem remains. I'll have a deeper
>>>> look into our attach/detach implementation of managed dataspaces, but I
>>>> cannot promise whether this will happen in short time.
>>>>
>>>> Best regards
>>>> Stefan
>>>>
>>>> On 09/26/2016 10:44 AM, Sebastian Sumpf wrote:
>>>>> Hey Denis,
>>>>>
>>>>> On 09/24/2016 06:20 PM, Denis Huber wrote:
>>>>>> Dear Genode Community,
>>>>>>
>>>>>> perhaps the wall of text is a bit discouraging to tackle the problem.
>>>>>> Let me summaries the important facts of the scenario:
>>>>>>
>>>>>> * Two components 'ckpt' and 'target'
>>>>>> * ckpt shares a thread capability of target's main thread
>>>>>> * ckpt shares a managed dataspace with target
>>>>>>    * this managed dataspace is initially empty
>>>>>>
>>>>>> target's behaviour:
>>>>>> * target periodically reads and writes from/to the managed dataspace
>>>>>> * target causes page faults (pf) which are handled by ckpt's pf handler
>>>>>> thread
>>>>>>    * pf handler attaches a pre-allocated dataspace to the managed
>>>>>> dataspace and resolves the pf
>>>>>>
>>>>>> ckpt's behaviour:
>>>>>> * ckpt periodically detaches all attached dataspaces from the managed
>>>>>> dataspace
>>>>>>
>>>>>> Outcome:
>>>>>> After two successful cycles (pf->attach->detach -> pf->attach->detach)
>>>>>> the target does not cause a pf, but reads and writes to the managed
>>>>>> dataspace although it is (theoretically) empty.
>>>>>>
>>>>>> I used Genode 16.05 with a foc_pbxa9 build. Can somebody help me with my
>>>>>> problem? I actually have no idea what could be the problem.
>>>>>>
>>>>>>
>>>>>
>>>>> You are programming against fairly untested grounds here. There still
>>>>> might be bugs or corner cases in this line of code. So, someone might
>>>>> have to look into things (while we are very busy right now). Your
>>>>> problem is reproducible with [4] right?
>>>>>
>>>>> By the way, your way of reporting is exceptional, the more information
>>>>> and actual test code we have, the better we can debug problems. So,
>>>>> please keep it this way, even though we might not read all of it at times ;)
>>>>>
>>>>> Regards and if I find the time, I will look into your issue,
>>>>>
>>>>> Sebastian
>>>>>
>>>>>>
>>>>>>
>>>>>> On 19.09.2016 15:01, Denis Huber wrote:
>>>>>>> Dear Genode Community,
>>>>>>>
>>>>>>> I want to implement a mechanism to monitor the access of a component to
>>>>>>> its address space.
>>>>>>>
>>>>>>> My idea is to implement a monitoring component which provides managed
>>>>>>> dataspaces to a target component. Each managed dataspace has several
>>>>>>> designated dataspaces (allocated, but not attached, and with a fixed
>>>>>>> location in the managed dataspace). I want to use several dataspaces to
>>>>>>> control the access range of the target component.
>>>>>>>
>>>>>>> Whenever the target component accesses an address in the managed
>>>>>>> dataspace, a page fault is triggered, because the managed dataspace has
>>>>>>> no dataspaces attached to it. The page fault is caught by a custom page
>>>>>>> fault handler. The page fault handler attaches the designated dataspace
>>>>>>> into the faulting managed dataspace and resolves the page fault.
>>>>>>>
>>>>>>> To test my concept I implemented a prototypical system with a monitoring
>>>>>>> component (called "ckpt") [1] and a target component [2].
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b502ffd962a87a5f9f790808b13554d6568f6d0b/src/test/concept_session_rm/server/main.cc
>>>>>>> [2]
>>>>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b502ffd962a87a5f9f790808b13554d6568f6d0b/src/test/concept_session_rm/client/main.cc
>>>>>>>
>>>>>>> The monitoring component provides a service [3] to receive a Thread
>>>>>>> capability to pause the target component before detaching the dataspace
>>>>>>> and resume after detaching and to provide a managed dataspace to the client.
>>>>>>>
>>>>>>> [3]
>>>>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/tree/b502ffd962a87a5f9f790808b13554d6568f6d0b/include/resource_session
>>>>>>>
>>>>>>> The monitoring component runs a main loop which pauses the client's main
>>>>>>> thread and detaches all attached dataspaces from the managed dataspace.
>>>>>>> The target component also runs a main loop which prints (reads) a number
>>>>>>> from the managed dataspace to the console and increments (writes) it in
>>>>>>> the managed dataspaces.
>>>>>>>
>>>>>>> The run script is found here [4].
>>>>>>>
>>>>>>> [4]
>>>>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b502ffd962a87a5f9f790808b13554d6568f6d0b/run/concept_session_rm.run
>>>>>>>
>>>>>>> The scenario works for the first 3 iterations of the monitoring
>>>>>>> component: Every 4 seconds it detaches the dataspaces from the managed
>>>>>>> dataspace and afterwards resolves the page faults by attaching the
>>>>>>> dataspaces back. After the 3. iteration, the target component accesses
>>>>>>> the theoretically empty managed dataspaces, but does not trigger a page
>>>>>>> fault. In fact, it reads and writes to the designated dataspaces as if
>>>>>>> it was attached.
>>>>>>>
>>>>>>> By running the run script I get the following output:
>>>>>>> [init -> target] Initialization started
>>>>>>> [init -> target] Requesting session to Resource service
>>>>>>> [init -> ckpt] Initialization started
>>>>>>> [init -> ckpt] Creating page fault handler thread
>>>>>>> [init -> ckpt] Announcing Resource service
>>>>>>> [init -> target] Sending main thread cap
>>>>>>> [init -> target] Requesting dataspace cap
>>>>>>> [init -> target] Attaching dataspace cap
>>>>>>> [init -> target] Initialization ended
>>>>>>> [init -> target] Starting main loop
>>>>>>> Genode::Pager_entrypoint::entry()::<lambda(Genode::Pager_object*)>:Could
>>>>>>> not resolve pf=6000 ip=10034bc
>>>>>>> [init -> ckpt] Initialization ended
>>>>>>> [init -> ckpt] Starting main loop
>>>>>>> [init -> ckpt] Waiting for page faults
>>>>>>> [init -> ckpt] Handling page fault: READ_FAULT pf_addr=0x00000000
>>>>>>> [init -> ckpt]   attached sub_ds0 at address 0x00000000
>>>>>>> [init -> ckpt] Waiting for page faults
>>>>>>> [init -> target] 0
>>>>>>> [init -> target] 1
>>>>>>> [init -> target] 2
>>>>>>> [init -> target] 3
>>>>>>> [init -> ckpt] Iteration #0
>>>>>>> [init -> ckpt]   valid thread
>>>>>>> [init -> ckpt]   detaching sub_ds_cap0
>>>>>>> [init -> ckpt]   sub_ds_cap1 already detached
>>>>>>> Genode::Pager_entrypoint::entry()::<lambda(Genode::Pager_object*)>:Could
>>>>>>> not resolve pf=6000 ip=10034bc
>>>>>>> [init -> ckpt] Handling page fault: READ_FAULT pf_addr=0x00000000
>>>>>>> [init -> ckpt]   attached sub_ds0 at address 0x00000000
>>>>>>> [init -> ckpt] Waiting for page faults
>>>>>>> [init -> target] 4
>>>>>>> [init -> target] 5
>>>>>>> [init -> target] 6
>>>>>>> [init -> target] 7
>>>>>>> [init -> ckpt] Iteration #1
>>>>>>> [init -> ckpt]   valid thread
>>>>>>> [init -> ckpt]   detaching sub_ds_cap0
>>>>>>> [init -> ckpt]   sub_ds_cap1 already detached
>>>>>>> [init -> target] 8
>>>>>>> [init -> target] 9
>>>>>>> [init -> target] 10
>>>>>>> [init -> target] 11
>>>>>>> [init -> ckpt] Iteration #2
>>>>>>> [init -> ckpt]   valid thread
>>>>>>> [init -> ckpt]   sub_ds_cap0 already detached
>>>>>>> [init -> ckpt]   sub_ds_cap1 already detached
>>>>>>> [init -> target] 12
>>>>>>> [init -> target] 13
>>>>>>>
>>>>>>> As you can see: After "iteration #1" ended, no page fault was caused,
>>>>>>> although the target component printed and incremented the integer stored
>>>>>>> in the managed dataspace.
>>>>>>>
>>>>>>> Could it be, that the detach method was not executed correctly?
>>>>>>>
>>>>>>>
>>>>>>> Kind regards
>>>>>>> Denis
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> _______________________________________________
>>>>>>> genode-main mailing list
>>>>>>> genode-main at lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> _______________________________________________
>>>>>> genode-main mailing list
>>>>>> genode-main at lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> _______________________________________________
>>>>> genode-main mailing list
>>>>> genode-main at lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>>>>
>>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> genode-main mailing list
>> genode-main at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>
>