Problem with 'test-pci'

List overview All Threads
Download

newer

older

...

VESA framebuffer - minor bug with...

Frank Kaiser

30 Jul 2009 30 Jul '09

4:14 p.m.

Hello

As a preparation of a certain task I want to check the PCI resources of my platform (IVI platform with Intel ATOM). For this purpose I built Genode-on-OKL4, only consisting of a minimum driver set and the test-pci application. Running this image in qemu looks good, but on the IVI platform the init process fails with a page fault before or when starting the PCI driver which is the first entry in the config file. The error message is:

no RM attachment (READ pf_addr=6000 pf_ip=2001286 from 01)

I have no clue what this message is trying to tell me. The given IP points to the function Genode::strncpy(). I also wonder why the system wants to read from virtual address 0x6000, because all modules are allocated beginning at virtual address 0x02000000. Checking init's pagetable with OKL4's KDB on qemu shows a number of allocations below:

00000000 [00140027]: tree=f0140000

00001000 [001f5067]: phys=001f5000 pg=f0140004 4KB rwx (RWX) user WB

00003000 [001f7067]: phys=001f7000 pg=f014000c 4KB rwx (RWX) user WB

00004000 [001f8067]: phys=001f8000 pg=f0140010 4KB rwx (RWX) user WB

00005000 [001df025]: phys=001df000 pg=f0140014 4KB r~x (R~X) user WB

00006000 [00275067]: phys=00275000 pg=f0140018 4KB rwx (RWX) user WB

00007000 [00276067]: phys=00276000 pg=f014001c 4KB rwx (RWX) user WB

00008000 [00277067]: phys=00277000 pg=f0140020 4KB rwx (RWX) user WB

00009000 [00278067]: phys=00278000 pg=f0140024 4KB rwx (RWX) user WB

0000a000 [00368067]: phys=00368000 pg=f0140028 4KB rwx (RWX) user WB

0000b000 [00369067]: phys=00369000 pg=f014002c 4KB rwx (RWX) user WB

0000c000 [0036a067]: phys=0036a000 pg=f0140030 4KB rwx (RWX) user WB

0000d000 [0036b067]: phys=0036b000 pg=f0140034 4KB rwx (RWX) user WB

0000e000 [0037b067]: phys=0037b000 pg=f0140038 4KB rwx (RWX) user WB

00012000 [003fa067]: phys=003fa000 pg=f0140048 4KB rwx (RWX) user WB

00016000 [00852067]: phys=00852000 pg=f0140058 4KB rwx (RWX) user WB

0004a000 [00336067]: phys=00336000 pg=f0140128 4KB rwx (RWX) user WB

00066000 [00370067]: phys=00370000 pg=f0140198 4KB rwx (RWX) user WB

On the IVI platform this area at the time of the page fault looks:

00000000 [00141027]: tree=f0141000

00001000 [001f5067]: phys=001f5000 pg=f0141004 4KB rwx (RWX) user WB

00005000 [001df025]: phys=001df000 pg=f0141014 4KB r~x (R~X) user WB

I'd like to get some hints where to look into the code for finding the cause of the problem. Since I cannot debug the platform, I probably have to add more trace messages to get additonal information about what is going on.

Regards

Frank

Attachments:

attachment.html (text/html — 7.9 KB)

Show replies by date

Christian Helmuth

30 Jul 30 Jul

8:54 p.m.

Hi,

On Thu, Jul 30, 2009 at 06:14:03PM +0200, Frank Kaiser wrote:

...

As a preparation of a certain task I want to check the PCI resources of my platform (IVI platform with Intel ATOM). For this purpose I built Genode-on-OKL4, only consisting of a minimum driver set and the test-pci application. Running this image in qemu looks good, but on the IVI platform the init process fails with a page fault before or when starting the PCI driver which is the first entry in the config file. The error message is:

no RM attachment (READ pf_addr=6000 pf_ip=2001286 from 01)

I have no clue what this message is trying to tell me.

The message indicates a potential bug with undefined pointers, i.e. init did not attach a dataspace at this virtual address.

...

The given IP points to the function Genode::strncpy(). I also wonder why the system wants to read from virtual address 0x6000, because all modules are allocated beginning at virtual address 0x02000000.

On Genode the core service RM (region manager) manages address spaces of processes. When init creates and attaches a new RAM dataspace to its virtual address space, a unused region fitting the dataspace is looked up by RM.

...

Checking init's pagetable with OKL4's KDB on qemu shows a number of allocations below:

00000000 [00140027]: tree=f0140000

[...]

...

00066000 [00370067]: phys=00370000 pg=f0140198 4KB rwx (RWX) user WB

Looks good and common for me ;-)

...

On the IVI platform this area at the time of the page fault looks:

00000000 [00141027]: tree=f0141000

00001000 [001f5067]: phys=001f5000 pg=f0141004 4KB rwx (RWX) user WB

00005000 [001df025]: phys=001df000 pg=f0141014 4KB r~x (R~X) user WB

I'd like to get some hints where to look into the code for finding the cause of the problem. Since I cannot debug the platform, I probably have to add more trace messages to get additonal information about what is going on.

I have no idea what happened, but files you should have a look at are:

base-okl4/src/core/rm_session_support.cc (set verbose_unmap) base/src/core/rm_session_component.cc (set verbose and verbose_page_faults)

Good luck

-- Christian Helmuth Genode Labs http://www.genode-labs.com/ · http://genode.org/

Norman Feske

2 Aug 2 Aug

3:38 p.m.

Hello Frank,

I think, you hit an issue with the handling of boot modules on OKL4. In contrast to running on Qemu, on real hardware, the padding space between boot modules is not cleared on startup so that there is the chance that the actual data is followed by bit garbage. This is particularly annoying for the config file. We directly pass the locally mapped config file to our XML parser, which expects a null termination. However, without initial clearing of memory, there may be no such termination. So the XML parser continues parsing until it hits the following (not mapped) page. The next release will fix the problem by allowing a length limit to be specified to the XML parser. For now, you can use the short-term fix to manually append a zero character to your config file.

I would be grateful to know if I'm guessing right and if this quick fix works for you.

Regards Norman

Frank Kaiser wrote:

...

Hello

As a preparation of a certain task I want to check the PCI resources of my platform (IVI platform with Intel ATOM). For this purpose I built Genode-on-OKL4, only consisting of a minimum driver set and the /test-pci/ application. Running this image in /qemu/ looks good, but on the IVI platform the /init/ process fails with a page fault before or when starting the PCI driver which is the first entry in the /config/ file. The error message is:

no RM attachment (READ pf_addr=6000 pf_ip=2001286 from 01)

I have no clue what this message is trying to tell me. The given IP points to the function /Genode::strncpy()/. I also wonder why the system wants to read from virtual address 0x6000, because all modules are allocated beginning at virtual address 0x02000000. Checking /init’s/ pagetable with OKL4’s KDB on /qemu/ shows a number of allocations below:

00000000 [00140027]: tree=f0140000

00001000 [001f5067]: phys=001f5000 pg=f0140004 4KB rwx (RWX) user WB

00003000 [001f7067]: phys=001f7000 pg=f014000c 4KB rwx (RWX) user WB

00004000 [001f8067]: phys=001f8000 pg=f0140010 4KB rwx (RWX) user WB

00005000 [001df025]: phys=001df000 pg=f0140014 4KB r~x (R~X) user WB

00006000 [00275067]: phys=00275000 pg=f0140018 4KB rwx (RWX) user WB

00007000 [00276067]: phys=00276000 pg=f014001c 4KB rwx (RWX) user WB

00008000 [00277067]: phys=00277000 pg=f0140020 4KB rwx (RWX) user WB

00009000 [00278067]: phys=00278000 pg=f0140024 4KB rwx (RWX) user WB

0000a000 [00368067]: phys=00368000 pg=f0140028 4KB rwx (RWX) user WB

0000b000 [00369067]: phys=00369000 pg=f014002c 4KB rwx (RWX) user WB

0000c000 [0036a067]: phys=0036a000 pg=f0140030 4KB rwx (RWX) user WB

0000d000 [0036b067]: phys=0036b000 pg=f0140034 4KB rwx (RWX) user WB

0000e000 [0037b067]: phys=0037b000 pg=f0140038 4KB rwx (RWX) user WB

00012000 [003fa067]: phys=003fa000 pg=f0140048 4KB rwx (RWX) user WB

00016000 [00852067]: phys=00852000 pg=f0140058 4KB rwx (RWX) user WB

0004a000 [00336067]: phys=00336000 pg=f0140128 4KB rwx (RWX) user WB

00066000 [00370067]: phys=00370000 pg=f0140198 4KB rwx (RWX) user WB

On the IVI platform this area at the time of the page fault looks:

00000000 [00141027]: tree=f0141000

00001000 [001f5067]: phys=001f5000 pg=f0141004 4KB rwx (RWX) user WB

00005000 [001df025]: phys=001df000 pg=f0141014 4KB r~x (R~X) user WB

I’d like to get some hints where to look into the code for finding the cause of the problem. Since I cannot debug the platform, I probably have to add more trace messages to get additonal information about what is going on.

Regards

Frank

Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july

Genode-main mailing list Genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main

-- Norman Feske Genode Labs http://www.genode-labs.com · http://genode.org

Frank Kaiser

3 Aug 3 Aug

9:17 a.m.

Hello, Norman

Your guess is right: The page fault is caused while parsing the config file. The trigger is the method Xml_node::content(), which tries to copy the process’ filename from the config file, but the root cause is a nasty bug in function Genode::strncpy() which is used to obtain the filename. In the function’s first line Genode::strlen() is used to determine the length of the source string. In the given case, where the source is a tagged item of the config file having no null termination, strlen() runs thru the memory until it randomly finds a null character. For my opinion Genode::strncpy() is not allowed to parse the source string beyond the given size argument. Your suggestion of appending a null character to the config file (by the way: how is this to be done w/o corrupting the XML syntax?) heals a symptome, but does not solve the root cause.

I tried to fix Genode::strncpy() myself. Since there is no function Genode::strnlen(), I made the following change:

size_t i = 0;

while (i < size)

{

if (src[i] == 0)

{

size = i;

break;

}

++i;

}

Interestingly this seem to trigger another problem. Now I get on all platforms the following two errors:

virtual Genode::Session_capability Genode::Core_parent::session(const char*, const char*): service_name="RM" arg="ram_quota=4K" not handled

virtual Genode::Session_capability Genode::Core_parent::session(const char*, const char*): service_name="PD" arg="ram_quota=4K" not handled

Could it be that there are already some workarounds for buggy Genode::strncpy(), which do not work anymore once the function is fixed?

Frank

...

-----Original Message-----

...

From: Norman Feske [mailto:norman.feske@...1...]

...

Sent: Sunday, August 02, 2009 5:38 PM

...

To: Genode OS Framework Mailing List

...

Subject: Re: Problem with 'test-pci'

...

Hello Frank,

...

I think, you hit an issue with the handling of boot modules on

...

OKL4. In contrast to running on Qemu, on real hardware, the padding

...

space between boot modules is not cleared on startup so that there is

...

the chance that the actual data is followed by bit garbage. This is

...

particularly annoying for the config file. We directly pass the locally

...

mapped config file to our XML parser, which expects a null termination.

...

However, without initial clearing of memory, there may be no such

...

termination. So the XML parser continues parsing until it hits the

...

following (not mapped) page. The next release will fix the problem by

...

allowing a length limit to be specified to the XML parser. For now, you

...

can use the short-term fix to manually append a zero character to your

...

config file.

...

I would be grateful to know if I'm guessing right and if this quick fix

...

works for you.

...

Regards

...

Norman

...

Frank Kaiser wrote:

...

...
Hello

...

...

...

...

...

...

...

...
As a preparation of a certain task I want to check the PCI resources of

...

...
my platform (IVI platform with Intel ATOM). For this purpose I built

...

...
Genode-on-OKL4, only consisting of a minimum driver set and the

...

...
/test-pci/ application. Running this image in /qemu/ looks good, but on

...

...
the IVI platform the /init/ process fails with a page fault before or

...

...
when starting the PCI driver which is the first entry in the /config/

...

...
file. The error message is:

...

...

...

...
no RM attachment (READ pf_addr=6000 pf_ip=2001286 from 01)

...

...

...

...
I have no clue what this message is trying to tell me. The given IP

...

...
points to the function /Genode::strncpy()/. I also wonder why the system

...

...
wants to read from virtual address 0x6000, because all modules are

...

...
allocated beginning at virtual address 0x02000000. Checking /init’s/

...

...
pagetable with OKL4’s KDB on /qemu/ shows a number of allocations below:

...

...

...

...
...

...

...

...

...
I’d like to get some hints where to look into the code for finding the

...

...
cause of the problem. Since I cannot debug the platform, I probably have

...

...
to add more trace messages to get additonal information about what is

...

...
going on.

...

...

...

...

...

...

...

...
Regards

...

...

...

...
Frank

Norman Feske

1:08 p.m.

Hi Frank,

thanks for your investigation. We have also hit this issue (hence my initial guess) on real hardware and it will be fixed in the upcoming release. Until then, I hope you are fine with the interim solution of appending the zero-termination manually. Of course, the pending null character does not comply to the XML syntax. It's just a work-around.

Regards Norman

Frank Kaiser wrote:

...

Your guess is right: The page fault is caused while parsing the config file. The trigger is the method /Xml_node::content()/, which tries to copy the process’ filename from the config file, but the root cause is a nasty bug in function /Genode::strncpy()/ which is used to obtain the filename. In the function’s first line /Genode::strlen()/ is used to determine the length of the source string. In the given case, where the source is a tagged item of the config file having no null termination, /strlen()/ runs thru the memory until it randomly finds a null character. For my opinion /Genode::strncpy()/ is not allowed to parse the source string beyond the given /size/ argument. Your suggestion of appending a null character to the config file (by the way: how is this to be done w/o corrupting the XML syntax?) heals a symptome, but does not solve the root cause.

I tried to fix /Genode::strncpy()/ myself. Since there is no function /Genode::strnlen()/, I made the following change:
    size_t i = 0;

    while (i < size)

    {

        if (src[i] == 0)

        {

            size = i;

            break;

        }

        ++i;

    }
Interestingly this seem to trigger another problem. Now I get on all platforms the following two errors:

virtual Genode::Session_capability Genode::Core_parent::session(const char*, const char*): service_name="RM" arg="ram_quota=4K" not handled

virtual Genode::Session_capability Genode::Core_parent::session(const char*, const char*): service_name="PD" arg="ram_quota=4K" not handled

Could it be that there are already some workarounds for buggy /Genode::strncpy()/, which do not work anymore once the function is fixed?

Frank

Frank Kaiser

4:18 p.m.

Hi, Norman

I prefer to fix the root cause. However my attempt outlined below did not work, since it does not take into account that the function writes a ‘\0’ at the end of the destination string (something the standard C library function doesn’t do), for which the calculated size value has to be adjusted. The final fix of Genode::strncpy() is:

size_t i = 0;

for (; i < (size - 1); ++i) // last char will be set to \0 anyway

{

if (src[i] == 0)

{

size = i + 1; // let room for \0 char

break;

}

Frank

...

-----Original Message-----

...

From: Norman Feske [mailto:norman.feske@...1...]

...

Sent: Monday, August 03, 2009 3:08 PM

...

To: Genode OS Framework Mailing List

...

Subject: Re: Problem with 'test-pci'

...

Hi Frank,

...

thanks for your investigation. We have also hit this issue (hence my

...

initial guess) on real hardware and it will be fixed in the upcoming

...

release. Until then, I hope you are fine with the interim solution of

...

appending the zero-termination manually. Of course, the pending null

...

character does not comply to the XML syntax. It's just a work-around.

...

Regards

...

Norman

...

Frank Kaiser wrote:

...

...
Your guess is right: The page fault is caused while parsing the config

...

...
file. The trigger is the method /Xml_node::content()/, which tries to

...

...
copy the process’ filename from the config file, but the root cause is a

...

...
nasty bug in function /Genode::strncpy()/ which is used to obtain the

...

...
filename. In the function’s first line /Genode::strlen()/ is used to

...

...
determine the length of the source string. In the given case, where the

...

...
source is a tagged item of the config file having no null termination,

...

...
/strlen()/ runs thru the memory until it randomly finds a null

...

...
character. For my opinion /Genode::strncpy()/ is not allowed to parse

...

...
the source string beyond the given /size/ argument. Your suggestion of

...

...
appending a null character to the config file (by the way: how is this

...

...
to be done w/o corrupting the XML syntax?) heals a symptome, but does

...

...
not solve the root cause.

...

...

...

...
I tried to fix /Genode::strncpy()/ myself. Since there is no function

...

...
/Genode::strnlen()/, I made the following change:

...

...

...

    size_t i = 0;

...

...

...

    while (i < size)

...

...

...

...

...

        if (src[i] == 0)

...

...

...

...

...

            size = i;

...

...

...

            break;

...

...

...

...

...

        ++i;

...

...

...

...

...

...
Interestingly this seem to trigger another problem. Now I get on all

...

...
platforms the following two errors:

...

...

...

...
virtual Genode::Session_capability Genode::Core_parent::session(const

...

...
char*, const char*): service_name="RM" arg="ram_quota=4K" not handled

...

...

...

...
virtual Genode::Session_capability Genode::Core_parent::session(const

...

...
char*, const char*): service_name="PD" arg="ram_quota=4K" not handled

...

...

...

...
Could it be that there are already some workarounds for buggy

...

...
/Genode::strncpy()/, which do not work anymore once the function is fixed?

...

...

...

...

...

...

...

...
Frank

Norman Feske

4 Aug 4 Aug

9:36 a.m.

Hi Frank,

Frank Kaiser wrote:

...

I prefer to fix the root cause. However my attempt outlined below did not work, since it does not take into account that the function writes a ‘\0’ at the end of the destination string (something the standard C library function doesn’t do), for which the calculated /size/ value has to be adjusted. The final fix of /Genode::strncpy()/ is:

Indeed. The libc version gives no indication of whether the string was cut or not. So you would need to check dst[size - 1] == 0 for the zero padding (which our version does not implement). So we decided to ensure that the result of the function is always a properly terminated string.

The strncpy function is only a part of a bigger problem, which is the reason why we deferred the fix until now. The root of the problem is that the end of a data spaces acquired from core's ROM service cannot be expected to be padded with zeros. In the corner case of a data module with the exact size of 4096 bytes, there is no padding at all. However, we mistakenly specified the local address of the mapped dataspace directly to the Xml_node constructor for parsing the config file. The constructor, however, expected a null-terminated string. Our fix introduces a further constructor argument for specifying the maximum length of the string. The proper handling of respecting this boundary needed code changes in the XML parser, the tokenizer, and some string functions (e.g., ascii_to_ulong). The strncpy function is final element in the chain of troublemakers ;-)

Looking from the implementation viewpoint, the strncpy function actually complied to the function interface and was not buggy. The interface expects a string as argument 'src', which is, by definition, null- terminated. The size argument is normally used to specify the boundary of the 'dst' buffer, which worked correctly. The problem is that strncpy is called with an invalid 'src' argument and the implicit assumption that the function will not touch memory beyond 'src + size - 1'. However, we need this semantics for our particular data-space-parsing use-case.

I have checked in the complete fix into our SVN. For strncpy, I went for a single loop rather than two loops (checking the size and memcpy) and I think that the resulting code is more obvious. In the process, I also complemented the documentation with regard to the differences between our implementation and the libc version.

Best regards Norman

5849

Age (days ago)

5854

Last active (days ago)

users@lists.genode.org

6 comments

3 participants

tags (0)

participants (3)

Christian Helmuth
Frank Kaiser
Norman Feske