Hi Daniel,
thanks for providing more details about the motivation behind your questions. From this information, I gather that you actually do not require the implementation of on-demand-paging policies outside of core. Is this correct?
1.) Distribute physical memory management across N cores and have each locally handle page faults. The purpose is to eliminate contention on physical memory AVL trees and avoid cross-core IPC as much as possible. Of course this requires partitioning the physical memory space. I also want to avoid eager mapping.
I can clearly see how to achieve these goals:
* Genode on Fiasco.OC already populates page tables in a lazy way. There are no eager mappings. When attaching a dataspace to the RM session of a process, the page table of the process remains unchanged. The mapping gets inserted not before the process touches the virtual memory location (and thereby triggers the page-fault mechanism implemented by core).
* To avoid cross-core IPC, there should be one pager thread per core. When setting the affinity of a thread, the pager should be set accordingly. This way, the handling of page faults would never cross CPU boundaries. That is actually quite straight forward to implement. On NOVA, we are even using one pager per thread. The flexibility to do that is built-in into the framework.
Also, I would investigate to use multiple entrypoing (i.e., one for each CPU) to handle core's services. For doing this, we could attach affinity information as session arguments and then direct the session request to the right entrypoint at session-creation time.
* To partition the physical memory, RAM sessions would need to carry affinity information with them - similar to how CPU sessions are already having priority information associated to them. Each RAM session would then use a different pool of physical memory.
2.) For the purpose of scaling device drivers, I would like to avoid serialization on the IRQ handling. I would like the kernel to deliver the IRQ request message directly to a registered handler or via a 'core' thread on the same core. We then use ACPI to nicely route IRQs across cores and parallelize IRQ handling load - this is useful for multi-queue NICs etc.
Maybe the sequence diagram was a bit misleading. It is displaying the sequence for just one IRQ number. For each IRQ, there is a completely different flow of control. I.e., there is one thread per IRQ session in core. The processing of IRQs are not serialized at all. They are processed concurrently.
There are opportunities for optimizations though. For example, we could consider to delegate the capability selectors for the used kernel IRQ objects to the respective IRQ-session clients. This way, we could take core completely out of the loop for the IRQ handling on Fiasco.OC. We haven't implemented this optimization yet for two mundane reasons. First, we wanted to avoid a special case for Fiasco.OC unless we are sure that the optimization is actually beneficial. And second, we handle shared IRQs in core. We haven't yet taken the time for investigating how to handle shared IRQs with the sole use of kernel IRQ objects.
I have been exploring the idea of setting up another child of core that has special kernel capabilities (yes I know TCB expansion) so it can set up threads and IRQs in this way. What do you think of this idea?
To me this looks like you are creating a new OS personality (or runtime) on top of Genode - similar to how L4Linux works. So you are effectively bypassing Genode (and naturally working around potential scalability issues). Personally, I would prefer to improve the underlying framework (in particular the implementation of core) to accommodate your requirements in the first place. This way, all Genode components would benefit, not only those that are children of your runtime.
Cheers Norman