Hi Adrian,
On 06.07.2015 15:07, Adrian-Ken Rueegsegger wrote:
I have implemented the above mentioned steps, see [1], and the boot time of a custom Linux (buildroot) from the bootloader menu to the prompt has been halved from ~1 min 03 seconds down to ~33 seconds.
Great to hear that the configuration already had a positive effect :)
However it seems that tweaking/playing around with the quota values [2] has no further effect on the execution time.
There is a bug [1] in the current master branch that prevents the initial quota configuration on the construction of a thread. Thus, threads merely receive quota when another thread at the session gets created or destructed. Could you please apply the commit in the referenced issue and try again.
Changing the base-hw super period to 100ms and the timeslice to 1ms in [3] reduces the boot time to ~19 seconds. I am not quite sure about the exact reason for the speedup but I presume it is due to the fact that the super period is much shorter and thus the quota of each thread is refilled more frequently.
I've talked with Alexander about that. Our main ideas regarding this:
* The shorter super period mainly is a benefit for threads with quota. It makes it less propable that quota threads don't use there whole quota during a super period and, as you mentioned, quota more often gets refilled. Thus, non-quota threads more often have only the unassigned quota left for their purposes. The shorter time slice lowers the latency during non-quota execution but also the throughput during this mode. In fact, finding the best super period and time slice is an open problem because we have not much "real world" data regarding the HW-scheduling by now.
* For comparison with NOVA, it might be a good idea to assign 100% of the CPU quota, because then, priorities are absolute also on base-hw. Later, you may switch the best-effort parts of the scenario back to a non-quota mode and see if the performance benefits from it.
* For a better debugging basis it might be useful to get the CPU utilization stats of the scenario. This can be achieved in the kernel pretty simple and unadulterately. You can find a basic implementation and demonstration on my branch [2] (all .* commits). The output is triggered via Kernel::print_char(0) that can be called from everywhere in the userland but you may also print it e.g. on console-input IRQ through THREAD_QUOTA_STATS_IRQ in the last commit. The printed "spent" values are the timer tics that a thread has used for its own or spent helping another thread. The printed "used" values are the timer tics that the thread was really executed (on its own or through a helper).
* Beside vbox, also Genode device drivers (especially the timer) should receive a reasonable amount of quota. On X86 the timer is also pretty time intensive. It should be able to update its state at least 19 times during a super period. In my qemu tests with cpu_quota.run, 10% of the super period (5 ms per update) were definitely enough for the timer but I assume it also works with less quota.
I gave the cpu_quota scenario including your changes regarding #1616 [4] a try on hw_x86_64 but it seems that it does not complete. Should the test pass successfully with your latest changes?
I think the problem is the timer. On ARM the timer is once configured to an X-seconds timeout and then sleeps until this timeout is over. On X86, however, the timer needs the guarantee to be scheduled frequently also during the timeout as mentioned above. Thus, the test simply takes to long although the measured results are good in principal. I'm currently working on this and will keep you up to date.
As this execution time is still a lot slower than e.g. NOVA/Virtualbox, which boots the same system in about ~7 seconds, there still seems to be lingering issue(s) with regards to the base-hw scheduling. I would be glad if you could investigate this problem.
According to my above writings, I think that we can still raise the performance through modifications at the application level. It would also be helpful to see your CPU-utilization stats if the scheudling remains a problem.
Cheers, Martin
[1] https://github.com/genodelabs/genode/issues/1620 [2] https://github.com/m-stein/genode/tree/hw_cpu_quota_stats