Hi there,
I'm currently doing tests of IPC on different microkernels using Genode framework.
I modified hello_tutorial repo to utilize this very simple scenario where server and client communicate via RPC.
My set up is as follows: I ran "make run/hello" in each build directory of fiasco_x86, nova_x86_32, okl4_x86, pistachio_x86. These build directories are all created as is instructed on genode website. Then I modified etc/build.conf to enable kvm, and uncommented "libports" repo, and added "hello_tutorial" repo.
In client.h, re-implement the add() function as to invoke 10 call<Rpc_add>() functions in sequence (to mitigate the variation of elapse cycles). Before the first invocation and after the tenth invocation of call<Rpc_add>(), I invoke two rdtsc respectively and store the current cycle in two variables. Then before this add() function returns, I print out the difference between the two rdtsc results.
Depend on the different microkernel, the cycles elapsed along the 10 Rpc_add execution varies from around 80000(for okl4_x86) to 170000 (for fiasco_x86). That is, the cycles elapsing while executing ONE Rpc_add would be around 8000 to 17000 cycles, which seems very long.
So I want to know: is these results reasonable? Or I missed anything in my tests?
Thank you very much!
Yelly
Hello,
On 03.04.2018 05:00, yu000013 wrote:
Then I modified etc/build.conf to enable kvm...
...> That is, the cycles elapsing while executing ONE Rpc_add would be around
8000 to 17000 cycles, which seems very long.
So I want to know: is these results reasonable? Or I missed anything in my tests?
your intuition is right. The numbers are certainly skewed by running Genode in a virtual machine. To get reasonable numbers, you'll need to execute the benchmark on real hardware.
May I ask about the motivation behind these measurements? I fear that you may mistake the outcome of microbenchmarks to draw generalized conclusions like kernel X is faster than kernel Y. In real-world scenarios, performance is influenced by many other factors than IPC roundtrip times, like cache/TLB footprint, platform setup (clocks, caching, use of MSIs), scheduling, or the friction between the kernel interface and the user space. To give an example for the latter, traditional L4 kernels lack a mechanism for asynchronous notifications. On these kernels, Genode's signaling API is implemented by using synchronous IPC and the core component as a proxy service. Even though synchronous IPC is fast, asynchronous notifications have to carry a lot of overhead.
I encourage you to take your measurements at a higher level to attain meaningful information out of your work. Possible examples are the throughput and latency of the NIC session interface, latency of ROM session updates, IRQ session latency, duration of component startup, framebuffer access, timer accuracy, or the latency of delivering user input events. As Genode is a component-based system, the chaining of components is particularly interesting, e.g., how does the network performance suffer from piping the traffic through a number of bump-in-the-wire network-processing components?
Once one finds an interesting anomaly, a specially crafted microbenchmark that artificially magnifies the issue would be the right tool to pinpoint the problem.
Cheers Norman