Hi Marc,
Marc CHALAND wrote:
2008/8/13 Norman Feske <norman.feske@...1...>:
Indeed, the current version of Genode locates stacks in the heap, which is generally a bad idea because this makes it really hard to detect stack overflows.
Sure. Better, stack should not be executable. The best would be that genode provide a way to prevent memory from being both writable and executable. On TUDOS, we found something : create a DM_Phys task (nxdm_phys for example) which provides only nx memory. Then only loader should get memory from normal dm_phys and provide not writable executable memory to launched processes. But, I know, fiasco doesn't provide abstraction to nx bit.
Supporting the NX bit makes much sense and it would also fit nicely into the Genode interfaces. However, this would not help for the current problem because both the stack and the heap are read/writable.
For solving your current problem, it would be helpful to be able to reproduce it. How do you trigger the bug and how does your config file look like? Could you provide us with the backtraces of Core's threads?
This is very easy to produce with svn 9 revision. My menu.lst is : set MCGEN="(nd)/tftpboot/l4-test3/marc/genode-build"
title Genode marc - demo kernel $(MCGEN)/3rd/l4env/bin/x86_586/bootstrap modaddr 0x03000000 module $(MCGEN)/3rd/fiasco_x86/main -serial_esc -comspeed 115200 -comport 1 module $(MCGEN)/3rd/l4env/bin/x86_586/l4v2/sigma0 module $(MCGEN)/genode_fiasco/bin/core module $(MCGEN)/genode_fiasco/bin/init module $(MCGEN)/genode_fiasco/bin/config module $(MCGEN)/genode_fiasco/bin/ps2_drv module $(MCGEN)/genode_fiasco/bin/vesa_drv module $(MCGEN)/genode_fiasco/bin/nitpicker module $(MCGEN)/genode_fiasco/bin/timer module $(MCGEN)/genode_fiasco/bin/launchpad module $(MCGEN)/genode_fiasco/bin/testnit module $(MCGEN)/genode_fiasco/bin/scout module $(MCGEN)/genode_fiasco/bin/nitlog module $(MCGEN)/genode_fiasco/bin/liquid_fb
If I remove applications after launchpad, there is no more problem but I cannot launch anything :/. Config file is :
<config> <start> <filename>timer</filename> <ram_quota>1M</ram_quota> </start> <start> <filename>ps2_drv</filename> <ram_quota>1M</ram_quota> </start> <start> <filename>vesa_drv</filename> <ram_quota>1M</ram_quota> </start> <start> <filename>nitpicker</filename> <ram_quota>1M</ram_quota> </start> <start> <filename>launchpad</filename> <ram_quota>32M</ram_quota> </start> </config>
Thank you for your configuration but unfortunately, I was not able to trigger the bug. Are you using the tool chain from our website?
_curr_obj is at 0x647a4. bt of 4.04 is : backtrace (thread 4.04, fp=000647a8, pc=010024dc): #1 000647a8 010024dc #2 000647f8 01002726 #3 00064928 0100337b #4 00064a58 0100285e #5 00064b88 0100285e #6 00064cb8 0100285e #7 00064de8 0100285e #8 00064f18 0100285e #9 00065048 0100285e #10 00065178 0100285e #11 000652a8 0100285e #12 000652c8 01000a1c #13 00065318 0100227e #14 00065338 01003c13 #15 00065358 010000e7 #16 00065378 0100030f #17 00065398 01000854 #18 000653c8 010008c6 #19 00065408 01002215 #20 00065428 010264f5 #21 00065458 01031629 #22 00065498 0100f959 #23 000654b8 010040d1 #24 000654d8 01025c00 #25 00065528 0102c062 #26 00065658 0102b6ac #27 000656c8 01026240 #28 00065748 010181aa #29 000657a8 01036278
--kernel-bt-follows-- #2 f005f438 f000c216 #3 f005f4c4 f002a306 #4 f005f4dc f001b04d #5 f005f4f4 f001af1f #6 c0204f2c f0023ac2 #7 c0204fa4 f0022491 #8 c0204fac f003aa04
This backtrace explains me where the problem lies as the difference between the first and the last stack frame are exactly 4K, which corresponds the the stack dimensioning of Core's Init-Child thread (4.4). The problem occurs when opening a new RM session, which requires memory allocation for the session object. During this memory allocation, a slab block fills up completely, requiring the allocation of another slab block. This nested allocation eats more than 4K. Could you try setting the stack size of to 8192 instead of 4096? Just modify the template argument for the server activation at
base/include/base/child.h:124
Best regards
Norman
2008/8/14 Norman Feske <norman.feske@...1...>:
Supporting the NX bit makes much sense and it would also fit nicely into the Genode interfaces. However, this would not help for the current problem because both the stack and the heap are read/writable.
Yes, of course.
Thank you for your configuration but unfortunately, I was not able to trigger the bug. Are you using the tool chain from our website?
No. The problem doesn't appear if I use toolchain. Maybe gcc 4.1.1 use less stack ? I had tu update my glibc to 2.4.
Could you try setting the stack size of to 8192 instead of 4096? Just modify the template argument for the server activation at base/include/base/child.h:124
I tried also to do this with my gcc 4.2.2 and the problem disappears too.
In both cases, I encounter mad mouse trouble when I change window size of scout for example. I get the following message : [init -> ps2_drv] void* fwrite(): fwrite - not yet implemented [init -> ps2_drv] void* fputs(const char*, void*): fputs: "Exception::Overflow" [init -> ps2_drv] void* fwrite(): fwrite - not yet implemented [init -> ps2_drv] void* abort(): abort called I guess, sync is lost. I cannot get mouse after this. On TUDOS, I fixed sync lost by changing driver priorities. Is there any way to do this in genode ?
Today, I decided to use genode toolchain and not modify server activation stack.
Regards Marc
Marc CHALAND wrote:
In both cases, I encounter mad mouse trouble when I change window size of scout for example. I get the following message : [init -> ps2_drv] void* fwrite(): fwrite - not yet implemented [init -> ps2_drv] void* fputs(const char*, void*): fputs: "Exception::Overflow" [init -> ps2_drv] void* fwrite(): fwrite - not yet implemented [init -> ps2_drv] void* abort(): abort called I guess, sync is lost. I cannot get mouse after this. On TUDOS, I fixed sync lost by changing driver priorities. Is there any way to do this in genode ?
Configuring priorities is not supported yet. The observed behaviour occurs when the event queue inside the PS/2 driver overruns. This happens when the GUI server is not able to fetch new batches of input events fast enough. Normally, Nitpicker should fetch events each 10ms. The PS/2 driver queues up to 255 events, which should be enough (Normally, PS/2 generates not more than 16Kbit/s, which would correspond to ca. 66 mouse events per 10ms). The queue overrun suggests that the driver is actually handling the incoming device interrupts properly. I guess, the sleep accuracy of the timer service (used by Nitpicker to wait 10ms) is the problem because this service has assigned the same round-robin priority as any other process.
I guess, its time to make priorities configurable in Genode to see if my guess is right ;-)
Hi Marc,
the patch attached to this email is a quick hack to make priorities configurable for the Fiasco version of Genode - just for enabling you to adjust priorities. By default, all processes are running on priority 0x10. With the patch, you can now override the default for each entry in your config file by specifying a <priority> tag. For example:
<start> <filename>timer</filename> <ram_quota>1M</ram_quota> <priority>0x20</priority> </start>
Could you try executing the timer server / ps2 driver / nitpicker on higher priorities?
Hi again,
I just noticed that at the end of my patch, a small change of 'tool/fiasco_build.conf' slipped in. Please do not apply that last part ;-)
Hi,
2008/8/14 Norman Feske <norman.feske@...1...>:
Could you try executing the timer server / ps2 driver / nitpicker on higher priorities?
Patch successfully applied and priorities are well changed. However, I still loose my mouse. In fact, this happens when I resize a window to a big size. I will try to investigate if there is some buffer overflow into nitpicker.
But, I have a question : IRQ thread into core process is still 0x10. May this cause trouble ?
Regards Marc
Hi Marc,
Patch successfully applied and priorities are well changed. However, I still loose my mouse. In fact, this happens when I resize a window to a big size. I will try to investigate if there is some buffer overflow into nitpicker.
But, I have a question : IRQ thread into core process is still 0x10. May this cause trouble ?
Sure, on high load (e.g., large pixel-copy operations via CPU), an IRQ thread could be preempted at any time, getting delayed by a whole time slice of 10ms or even longer. Hence, boosting the priority of IRQ threads is certainly a good idea. The attached patch assigns the priority 0xc0 to these threads. Could you give it a try?
Regards Norman
2008/8/18 Norman Feske <norman.feske@...1...>:
Hence, boosting the priority of IRQ threads is certainly a good idea. The attached patch assigns the priority 0xc0 to these threads. Could you give it a try?
Perfect :). In fact, this patch is enough to get rid of mad mouse. No need to change ps2_drv priority or timer for my test bed. Demo looks fine and genode concepts very interesting :).
Regards Marc
Hello Marc,
Marc CHALAND wrote:
Perfect :). In fact, this patch is enough to get rid of mad mouse. No need to change ps2_drv priority or timer for my test bed. Demo looks fine and genode concepts very interesting :).
Great to hear that its working now! We have committed the IRQ-priority patch to the subversion repository. Thanks for your very helpful suggestion! :-)
Regards Norman