Genode Page-Fault Handling and IRQ architecture

Wed Dec 19 22:42:03 CET 2012

Hi Norman, thanks for your quick reply. Responses inline...

> -----Original Message-----
> From: Norman Feske [mailto:norman.feske at ...1...]
> Sent: Wednesday, December 19, 2012 11:08 AM
> To: genode-main at lists.sourceforge.net
> Subject: Re: Genode Page-Fault Handling and IRQ architecture
> 
> Hi Daniel,
> 
> thanks for providing more details about the motivation behind your
> questions. From this information, I gather that you actually do not
require
> the implementation of on-demand-paging policies outside of core.
> Is this correct?

Yes.

> > 1.) Distribute physical memory management across N cores and have each
> > locally handle page faults.  The purpose is to eliminate contention on
> > physical memory AVL trees and avoid cross-core IPC as much as
> > possible.  Of course this requires partitioning the physical memory
> > space.  I also want to avoid eager mapping.
> 
> I can clearly see how to achieve these goals:

Good. That's encouraging! 

> * Genode on Fiasco.OC already populates page tables in a lazy way.
>   There are no eager mappings. When attaching a dataspace to the
>   RM session of a process, the page table of the process remains
>   unchanged. The mapping gets inserted not before the process touches
>   the virtual memory location (and thereby triggers the page-fault
>   mechanism implemented by core).
> 
> * To avoid cross-core IPC, there should be one pager thread per core.
>   When setting the affinity of a thread, the pager should be set
>   accordingly. This way, the handling of page faults would never cross
>   CPU boundaries. That is actually quite straight forward to implement.
>   On NOVA, we are even using one pager per thread. The flexibility to
>   do that is built-in into the framework.
> 
>   Also, I would investigate to use multiple entrypoing (i.e., one for
>   each CPU) to handle core's services. For doing this, we could attach
>   affinity information as session arguments and then direct the session
>   request to the right entrypoint at session-creation time.

Sounds sensible. I would use a mask to future proof for other schedulers.

> * To partition the physical memory, RAM sessions would need to carry
>   affinity information with them - similar to how CPU sessions are
>   already having priority information associated to them. Each RAM
>   session would then use a different pool of physical memory.

Could I not use nested RM sessions/dataspaces to help with this?

> > 2.) For the purpose of scaling device drivers, I would like to avoid
> > serialization on the IRQ handling.  I would like the kernel to deliver
> > the IRQ request message directly to a registered handler or via a
> > 'core' thread on the same core.  We then use ACPI to nicely route IRQs
> > across cores and parallelize IRQ handling load - this is useful for
multi-queue
> NICs etc.
> 
> Maybe the sequence diagram was a bit misleading. It is displaying the
> sequence for just one IRQ number. For each IRQ, there is a completely
> different flow of control. I.e., there is one thread per IRQ session in
core. The
> processing of IRQs are not serialized at all. They are processed
concurrently.
> 
> There are opportunities for optimizations though. For example, we could
> consider to delegate the capability selectors for the used kernel IRQ
objects
> to the respective IRQ-session clients. This way, we could take core
> completely out of the loop for the IRQ handling on Fiasco.OC. We haven't
> implemented this optimization yet for two mundane reasons.
> First, we wanted to avoid a special case for Fiasco.OC unless we are sure
that
> the optimization is actually beneficial. And second, we handle shared IRQs
in
> core. We haven't yet taken the time for investigating how to handle shared
> IRQs with the sole use of kernel IRQ objects.
> 
> > I have been exploring the idea of setting up another child of core
> > that has special kernel capabilities (yes I know TCB expansion) so it
> > can set up threads and IRQs in this way.  What do you think of this
idea?
> 
> To me this looks like you are creating a new OS personality (or runtime)
on
> top of Genode - similar to how L4Linux works. So you are effectively
> bypassing Genode (and naturally working around potential scalability
issues).
> Personally, I would prefer to improve the underlying framework (in
particular
> the implementation of core) to accommodate your requirements in the first
> place. This way, all Genode components would benefit, not only those that
> are children of your runtime.

Yes, I guess so - that is providing it makes sense for the general Genode
distribution.  Also, this
might be a stop-gap until we/you get some of the changes into Genode.

Daniel