Page faults in managed dataspaces

Stefan Kalkowski stefan.kalkowski at ...1...
Tue Sep 27 09:17:58 CEST 2016


Hi Dennis,

On 09/26/2016 04:48 PM, Denis Huber wrote:
> Hello Stefan,
> 
> thank you for your help and finding the problem :)
> 
> Can you tell me, how I can obtain your unofficial upgrade of foc and how 
> I can replace Genode's standard version with it?

But be warned: it is unofficial, because I started to upgrade but
stopped at some point due to timing constraints. That means certain
problems we already fixed in the older version might still exist in the
upgrade. Moreover, it is almost completely untested. Having said this,
you can find it in my repository, it is the branch called foc_update.
I've rebased it to the current master branch of Genode.

Regards
Stefan

> 
> 
> Kind regards,
> Denis
> 
> On 26.09.2016 15:15, Stefan Kalkowski wrote:
>> Hi Dennis,
>>
>> I further examined the issue. First, I found out that is is specific to
>> Fiasco.OC. If you use another kernel, e.g., Nova, with the same test, it
>> succeeds. So I instrumented the core component to always enter
>> Fiasco.OC's kernel debugger when core unmapped the corresponding managed
>> dataspace. When looking at the page-tables I could see that the mapping
>> was successfully deleted. After that I enabled all kind of loggings
>> related to page-faults and mapping operations. Lo and behold, after
>> continuing and seeing that the "target" thread continued, I re-entered
>> the kernel debugger and realized that the page-table entry reappeared
>> although the kernel did not list any activity regarding page-faults and
>> mappings. To me this is a clear kernel bug.
>>
>> I've tried out my unofficial upgrade to revision r67 of the Fiasco.OC
>> kernel, and with that version it seemed to work correctly (I just tested
>> some rounds).
>>
>> I fear the currently supported version of Fiasco.OC is buggy with
>> respect to the unmap call, at least the way Genode has to use it.
>>
>> Regards
>> Stefan
>>
>> On 09/26/2016 11:13 AM, Stefan Kalkowski wrote:
>>> Hi Dennis,
>>>
>>> I've looked into your code, and what struck me first was that you use
>>> two threads in your server, which share data in between
>>> (Resource::Client_resources) without synchronization.
>>>
>>> I've rewritten your example server to only use one thread in a
>>> state-machine like fashion, have a look here:
>>>
>>>
>>> https://github.com/skalk/genode-CheckpointRestore-SharedMemory/commit/d9732dcab331cecdfd4fcc5c8948d9ca23d95e84
>>>
>>> This way it is thread-safe, simpler (less code), and if you are adapted
>>> to it, it becomes even easier to understand.
>>>
>>> Nevertheless, although the possible synchronization problems are
>>> eliminated by design, your described problem remains. I'll have a deeper
>>> look into our attach/detach implementation of managed dataspaces, but I
>>> cannot promise whether this will happen in short time.
>>>
>>> Best regards
>>> Stefan
>>>
>>> On 09/26/2016 10:44 AM, Sebastian Sumpf wrote:
>>>> Hey Denis,
>>>>
>>>> On 09/24/2016 06:20 PM, Denis Huber wrote:
>>>>> Dear Genode Community,
>>>>>
>>>>> perhaps the wall of text is a bit discouraging to tackle the problem.
>>>>> Let me summaries the important facts of the scenario:
>>>>>
>>>>> * Two components 'ckpt' and 'target'
>>>>> * ckpt shares a thread capability of target's main thread
>>>>> * ckpt shares a managed dataspace with target
>>>>>    * this managed dataspace is initially empty
>>>>>
>>>>> target's behaviour:
>>>>> * target periodically reads and writes from/to the managed dataspace
>>>>> * target causes page faults (pf) which are handled by ckpt's pf handler
>>>>> thread
>>>>>    * pf handler attaches a pre-allocated dataspace to the managed
>>>>> dataspace and resolves the pf
>>>>>
>>>>> ckpt's behaviour:
>>>>> * ckpt periodically detaches all attached dataspaces from the managed
>>>>> dataspace
>>>>>
>>>>> Outcome:
>>>>> After two successful cycles (pf->attach->detach -> pf->attach->detach)
>>>>> the target does not cause a pf, but reads and writes to the managed
>>>>> dataspace although it is (theoretically) empty.
>>>>>
>>>>> I used Genode 16.05 with a foc_pbxa9 build. Can somebody help me with my
>>>>> problem? I actually have no idea what could be the problem.
>>>>>
>>>>>
>>>>
>>>> You are programming against fairly untested grounds here. There still
>>>> might be bugs or corner cases in this line of code. So, someone might
>>>> have to look into things (while we are very busy right now). Your
>>>> problem is reproducible with [4] right?
>>>>
>>>> By the way, your way of reporting is exceptional, the more information
>>>> and actual test code we have, the better we can debug problems. So,
>>>> please keep it this way, even though we might not read all of it at times ;)
>>>>
>>>> Regards and if I find the time, I will look into your issue,
>>>>
>>>> Sebastian
>>>>
>>>>>
>>>>>
>>>>> On 19.09.2016 15:01, Denis Huber wrote:
>>>>>> Dear Genode Community,
>>>>>>
>>>>>> I want to implement a mechanism to monitor the access of a component to
>>>>>> its address space.
>>>>>>
>>>>>> My idea is to implement a monitoring component which provides managed
>>>>>> dataspaces to a target component. Each managed dataspace has several
>>>>>> designated dataspaces (allocated, but not attached, and with a fixed
>>>>>> location in the managed dataspace). I want to use several dataspaces to
>>>>>> control the access range of the target component.
>>>>>>
>>>>>> Whenever the target component accesses an address in the managed
>>>>>> dataspace, a page fault is triggered, because the managed dataspace has
>>>>>> no dataspaces attached to it. The page fault is caught by a custom page
>>>>>> fault handler. The page fault handler attaches the designated dataspace
>>>>>> into the faulting managed dataspace and resolves the page fault.
>>>>>>
>>>>>> To test my concept I implemented a prototypical system with a monitoring
>>>>>> component (called "ckpt") [1] and a target component [2].
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b502ffd962a87a5f9f790808b13554d6568f6d0b/src/test/concept_session_rm/server/main.cc
>>>>>> [2]
>>>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b502ffd962a87a5f9f790808b13554d6568f6d0b/src/test/concept_session_rm/client/main.cc
>>>>>>
>>>>>> The monitoring component provides a service [3] to receive a Thread
>>>>>> capability to pause the target component before detaching the dataspace
>>>>>> and resume after detaching and to provide a managed dataspace to the client.
>>>>>>
>>>>>> [3]
>>>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/tree/b502ffd962a87a5f9f790808b13554d6568f6d0b/include/resource_session
>>>>>>
>>>>>> The monitoring component runs a main loop which pauses the client's main
>>>>>> thread and detaches all attached dataspaces from the managed dataspace.
>>>>>> The target component also runs a main loop which prints (reads) a number
>>>>>> from the managed dataspace to the console and increments (writes) it in
>>>>>> the managed dataspaces.
>>>>>>
>>>>>> The run script is found here [4].
>>>>>>
>>>>>> [4]
>>>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b502ffd962a87a5f9f790808b13554d6568f6d0b/run/concept_session_rm.run
>>>>>>
>>>>>> The scenario works for the first 3 iterations of the monitoring
>>>>>> component: Every 4 seconds it detaches the dataspaces from the managed
>>>>>> dataspace and afterwards resolves the page faults by attaching the
>>>>>> dataspaces back. After the 3. iteration, the target component accesses
>>>>>> the theoretically empty managed dataspaces, but does not trigger a page
>>>>>> fault. In fact, it reads and writes to the designated dataspaces as if
>>>>>> it was attached.
>>>>>>
>>>>>> By running the run script I get the following output:
>>>>>> [init -> target] Initialization started
>>>>>> [init -> target] Requesting session to Resource service
>>>>>> [init -> ckpt] Initialization started
>>>>>> [init -> ckpt] Creating page fault handler thread
>>>>>> [init -> ckpt] Announcing Resource service
>>>>>> [init -> target] Sending main thread cap
>>>>>> [init -> target] Requesting dataspace cap
>>>>>> [init -> target] Attaching dataspace cap
>>>>>> [init -> target] Initialization ended
>>>>>> [init -> target] Starting main loop
>>>>>> Genode::Pager_entrypoint::entry()::<lambda(Genode::Pager_object*)>:Could
>>>>>> not resolve pf=6000 ip=10034bc
>>>>>> [init -> ckpt] Initialization ended
>>>>>> [init -> ckpt] Starting main loop
>>>>>> [init -> ckpt] Waiting for page faults
>>>>>> [init -> ckpt] Handling page fault: READ_FAULT pf_addr=0x00000000
>>>>>> [init -> ckpt]   attached sub_ds0 at address 0x00000000
>>>>>> [init -> ckpt] Waiting for page faults
>>>>>> [init -> target] 0
>>>>>> [init -> target] 1
>>>>>> [init -> target] 2
>>>>>> [init -> target] 3
>>>>>> [init -> ckpt] Iteration #0
>>>>>> [init -> ckpt]   valid thread
>>>>>> [init -> ckpt]   detaching sub_ds_cap0
>>>>>> [init -> ckpt]   sub_ds_cap1 already detached
>>>>>> Genode::Pager_entrypoint::entry()::<lambda(Genode::Pager_object*)>:Could
>>>>>> not resolve pf=6000 ip=10034bc
>>>>>> [init -> ckpt] Handling page fault: READ_FAULT pf_addr=0x00000000
>>>>>> [init -> ckpt]   attached sub_ds0 at address 0x00000000
>>>>>> [init -> ckpt] Waiting for page faults
>>>>>> [init -> target] 4
>>>>>> [init -> target] 5
>>>>>> [init -> target] 6
>>>>>> [init -> target] 7
>>>>>> [init -> ckpt] Iteration #1
>>>>>> [init -> ckpt]   valid thread
>>>>>> [init -> ckpt]   detaching sub_ds_cap0
>>>>>> [init -> ckpt]   sub_ds_cap1 already detached
>>>>>> [init -> target] 8
>>>>>> [init -> target] 9
>>>>>> [init -> target] 10
>>>>>> [init -> target] 11
>>>>>> [init -> ckpt] Iteration #2
>>>>>> [init -> ckpt]   valid thread
>>>>>> [init -> ckpt]   sub_ds_cap0 already detached
>>>>>> [init -> ckpt]   sub_ds_cap1 already detached
>>>>>> [init -> target] 12
>>>>>> [init -> target] 13
>>>>>>
>>>>>> As you can see: After "iteration #1" ended, no page fault was caused,
>>>>>> although the target component printed and incremented the integer stored
>>>>>> in the managed dataspace.
>>>>>>
>>>>>> Could it be, that the detach method was not executed correctly?
>>>>>>
>>>>>>
>>>>>> Kind regards
>>>>>> Denis
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> _______________________________________________
>>>>>> genode-main mailing list
>>>>>> genode-main at lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> _______________________________________________
>>>>> genode-main mailing list
>>>>> genode-main at lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> _______________________________________________
>>>> genode-main mailing list
>>>> genode-main at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>>>
>>>
>>
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> genode-main mailing list
> genode-main at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/genode-main
> 

-- 
Stefan Kalkowski
Genode Labs

https://github.com/skalk ยท http://genode.org/




More information about the users mailing list