Page faults in managed dataspaces

Stefan Kalkowski stefan.kalkowski at ...1...
Mon Sep 26 15:15:55 CEST 2016


Hi Dennis,

I further examined the issue. First, I found out that is is specific to
Fiasco.OC. If you use another kernel, e.g., Nova, with the same test, it
succeeds. So I instrumented the core component to always enter
Fiasco.OC's kernel debugger when core unmapped the corresponding managed
dataspace. When looking at the page-tables I could see that the mapping
was successfully deleted. After that I enabled all kind of loggings
related to page-faults and mapping operations. Lo and behold, after
continuing and seeing that the "target" thread continued, I re-entered
the kernel debugger and realized that the page-table entry reappeared
although the kernel did not list any activity regarding page-faults and
mappings. To me this is a clear kernel bug.

I've tried out my unofficial upgrade to revision r67 of the Fiasco.OC
kernel, and with that version it seemed to work correctly (I just tested
some rounds).

I fear the currently supported version of Fiasco.OC is buggy with
respect to the unmap call, at least the way Genode has to use it.

Regards
Stefan

On 09/26/2016 11:13 AM, Stefan Kalkowski wrote:
> Hi Dennis,
> 
> I've looked into your code, and what struck me first was that you use
> two threads in your server, which share data in between
> (Resource::Client_resources) without synchronization.
> 
> I've rewritten your example server to only use one thread in a
> state-machine like fashion, have a look here:
> 
> 
> https://github.com/skalk/genode-CheckpointRestore-SharedMemory/commit/d9732dcab331cecdfd4fcc5c8948d9ca23d95e84
> 
> This way it is thread-safe, simpler (less code), and if you are adapted
> to it, it becomes even easier to understand.
> 
> Nevertheless, although the possible synchronization problems are
> eliminated by design, your described problem remains. I'll have a deeper
> look into our attach/detach implementation of managed dataspaces, but I
> cannot promise whether this will happen in short time.
> 
> Best regards
> Stefan
> 
> On 09/26/2016 10:44 AM, Sebastian Sumpf wrote:
>> Hey Denis,
>>
>> On 09/24/2016 06:20 PM, Denis Huber wrote:
>>> Dear Genode Community,
>>>
>>> perhaps the wall of text is a bit discouraging to tackle the problem. 
>>> Let me summaries the important facts of the scenario:
>>>
>>> * Two components 'ckpt' and 'target'
>>> * ckpt shares a thread capability of target's main thread
>>> * ckpt shares a managed dataspace with target
>>>    * this managed dataspace is initially empty
>>>
>>> target's behaviour:
>>> * target periodically reads and writes from/to the managed dataspace
>>> * target causes page faults (pf) which are handled by ckpt's pf handler 
>>> thread
>>>    * pf handler attaches a pre-allocated dataspace to the managed 
>>> dataspace and resolves the pf
>>>
>>> ckpt's behaviour:
>>> * ckpt periodically detaches all attached dataspaces from the managed 
>>> dataspace
>>>
>>> Outcome:
>>> After two successful cycles (pf->attach->detach -> pf->attach->detach) 
>>> the target does not cause a pf, but reads and writes to the managed 
>>> dataspace although it is (theoretically) empty.
>>>
>>> I used Genode 16.05 with a foc_pbxa9 build. Can somebody help me with my 
>>> problem? I actually have no idea what could be the problem.
>>>
>>>
>>
>> You are programming against fairly untested grounds here. There still
>> might be bugs or corner cases in this line of code. So, someone might
>> have to look into things (while we are very busy right now). Your
>> problem is reproducible with [4] right?
>>
>> By the way, your way of reporting is exceptional, the more information
>> and actual test code we have, the better we can debug problems. So,
>> please keep it this way, even though we might not read all of it at times ;)
>>
>> Regards and if I find the time, I will look into your issue,
>>
>> Sebastian
>>
>>>
>>>
>>> On 19.09.2016 15:01, Denis Huber wrote:
>>>> Dear Genode Community,
>>>>
>>>> I want to implement a mechanism to monitor the access of a component to
>>>> its address space.
>>>>
>>>> My idea is to implement a monitoring component which provides managed
>>>> dataspaces to a target component. Each managed dataspace has several
>>>> designated dataspaces (allocated, but not attached, and with a fixed
>>>> location in the managed dataspace). I want to use several dataspaces to
>>>> control the access range of the target component.
>>>>
>>>> Whenever the target component accesses an address in the managed
>>>> dataspace, a page fault is triggered, because the managed dataspace has
>>>> no dataspaces attached to it. The page fault is caught by a custom page
>>>> fault handler. The page fault handler attaches the designated dataspace
>>>> into the faulting managed dataspace and resolves the page fault.
>>>>
>>>> To test my concept I implemented a prototypical system with a monitoring
>>>> component (called "ckpt") [1] and a target component [2].
>>>>
>>>> [1]
>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b502ffd962a87a5f9f790808b13554d6568f6d0b/src/test/concept_session_rm/server/main.cc
>>>> [2]
>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b502ffd962a87a5f9f790808b13554d6568f6d0b/src/test/concept_session_rm/client/main.cc
>>>>
>>>> The monitoring component provides a service [3] to receive a Thread
>>>> capability to pause the target component before detaching the dataspace
>>>> and resume after detaching and to provide a managed dataspace to the client.
>>>>
>>>> [3]
>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/tree/b502ffd962a87a5f9f790808b13554d6568f6d0b/include/resource_session
>>>>
>>>> The monitoring component runs a main loop which pauses the client's main
>>>> thread and detaches all attached dataspaces from the managed dataspace.
>>>> The target component also runs a main loop which prints (reads) a number
>>>> from the managed dataspace to the console and increments (writes) it in
>>>> the managed dataspaces.
>>>>
>>>> The run script is found here [4].
>>>>
>>>> [4]
>>>> https://github.com/702nADOS/genode-CheckpointRestore-SharedMemory/blob/b502ffd962a87a5f9f790808b13554d6568f6d0b/run/concept_session_rm.run
>>>>
>>>> The scenario works for the first 3 iterations of the monitoring
>>>> component: Every 4 seconds it detaches the dataspaces from the managed
>>>> dataspace and afterwards resolves the page faults by attaching the
>>>> dataspaces back. After the 3. iteration, the target component accesses
>>>> the theoretically empty managed dataspaces, but does not trigger a page
>>>> fault. In fact, it reads and writes to the designated dataspaces as if
>>>> it was attached.
>>>>
>>>> By running the run script I get the following output:
>>>> [init -> target] Initialization started
>>>> [init -> target] Requesting session to Resource service
>>>> [init -> ckpt] Initialization started
>>>> [init -> ckpt] Creating page fault handler thread
>>>> [init -> ckpt] Announcing Resource service
>>>> [init -> target] Sending main thread cap
>>>> [init -> target] Requesting dataspace cap
>>>> [init -> target] Attaching dataspace cap
>>>> [init -> target] Initialization ended
>>>> [init -> target] Starting main loop
>>>> Genode::Pager_entrypoint::entry()::<lambda(Genode::Pager_object*)>:Could
>>>> not resolve pf=6000 ip=10034bc
>>>> [init -> ckpt] Initialization ended
>>>> [init -> ckpt] Starting main loop
>>>> [init -> ckpt] Waiting for page faults
>>>> [init -> ckpt] Handling page fault: READ_FAULT pf_addr=0x00000000
>>>> [init -> ckpt]   attached sub_ds0 at address 0x00000000
>>>> [init -> ckpt] Waiting for page faults
>>>> [init -> target] 0
>>>> [init -> target] 1
>>>> [init -> target] 2
>>>> [init -> target] 3
>>>> [init -> ckpt] Iteration #0
>>>> [init -> ckpt]   valid thread
>>>> [init -> ckpt]   detaching sub_ds_cap0
>>>> [init -> ckpt]   sub_ds_cap1 already detached
>>>> Genode::Pager_entrypoint::entry()::<lambda(Genode::Pager_object*)>:Could
>>>> not resolve pf=6000 ip=10034bc
>>>> [init -> ckpt] Handling page fault: READ_FAULT pf_addr=0x00000000
>>>> [init -> ckpt]   attached sub_ds0 at address 0x00000000
>>>> [init -> ckpt] Waiting for page faults
>>>> [init -> target] 4
>>>> [init -> target] 5
>>>> [init -> target] 6
>>>> [init -> target] 7
>>>> [init -> ckpt] Iteration #1
>>>> [init -> ckpt]   valid thread
>>>> [init -> ckpt]   detaching sub_ds_cap0
>>>> [init -> ckpt]   sub_ds_cap1 already detached
>>>> [init -> target] 8
>>>> [init -> target] 9
>>>> [init -> target] 10
>>>> [init -> target] 11
>>>> [init -> ckpt] Iteration #2
>>>> [init -> ckpt]   valid thread
>>>> [init -> ckpt]   sub_ds_cap0 already detached
>>>> [init -> ckpt]   sub_ds_cap1 already detached
>>>> [init -> target] 12
>>>> [init -> target] 13
>>>>
>>>> As you can see: After "iteration #1" ended, no page fault was caused,
>>>> although the target component printed and incremented the integer stored
>>>> in the managed dataspace.
>>>>
>>>> Could it be, that the detach method was not executed correctly?
>>>>
>>>>
>>>> Kind regards
>>>> Denis
>>>>
>>>> ------------------------------------------------------------------------------
>>>> _______________________________________________
>>>> genode-main mailing list
>>>> genode-main at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> genode-main mailing list
>>> genode-main at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> genode-main mailing list
>> genode-main at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/genode-main
>>
> 

-- 
Stefan Kalkowski
Genode Labs

https://github.com/skalk ยท http://genode.org/




More information about the users mailing list