Hi,
after some time of inactivity I decided to get back to my work on porting Genode to different Raspberry Pi devices. As in the meantime major blocker (problem with usb host) seem to by resolved and much improvements in multiple aspects related to introducing new boards (platform driver, unified naming of depot packages, etc.) I'm optimistic about results. However I run into a problem that stopped me for quite some time now and I'm getting out of ideas.
I started with an attempt to make run/ping work on Raspberry Pi 2. I thought that it will be a straightforward work as:
* it already works on Raspberry Pi 1
* Raspberry Pi 2 is has the same usb and network device
* interrupt controller is similar and I already have implementation for it working on some earlier Genode release
With addition of some logs I've verified that on both variants (Pi 1 and Pi 2):
* 'rpi_usb_host_drv' asked for a device from 'rpi_new_platform_drv'
* 'Irq_session_component::sigh(...) was called for IRQ 9 (usb device interrupt) - I believe that it was due to rpc from 'rpi_usb_host_drv'
* 'Irq::Context' constructor was called in 'rpi_usb_host_drv' and after that 'Irq_session_component::sigh()' and 'Irq_session_component::ack_irq()' were called for interrupt 9
* after some device driver initialization and discovery 'rpi_usb_host_drv' prints log 'dev_dbg: Calling enable_global_interrupts' and after that IRQ 9 is detected in `bcm2835_pic.cc` and 'Kernel::User_irq::occured()' for that interrupt is called.
After that there is a difference. On a Raspberry Pi 1 'Lx_kit::Irq::Context::handle_irq()' in 'rpi_usb_host_drv' is executed but it doesn't on version 2.
As I verified that interrupt is detected and processed AFAIU properly on base-hw level I don't see other device specific things that can cause this behavior.
Is it possible that new platform driver somehow 'blocks' this interrupt delivery? I don't see how would that be as calls to 'sigh()' and 'ack_irq()' worked.
Can someone share some ideas what can be wrong or how should I try to debug such problem? Maybe that is something obvious and only I can't find it.
Currently my branch is based on master from the middel of August and if that can help I can publish it after some minor cleanup.
Tomasz Gajewski
Hello again,
I've been able to move forward a little so I'm answering myself.
Can someone share some ideas what can be wrong or how should I try to debug such problem? Maybe that is something obvious and only I can't find it.
With help of trace machinery and 'rpc_name' policy (about which I did not remember until today) I found out that driver probably gets stuck somewhere as at some point it completely stops any rpc activity. But given that I know what rpc calls it makes before hang and what it does next if works properly I should be able to hunt down the problem and hopefully fix it.
Tomasz Gajewski
Hi again,
I'm completely without ideas and need to ask for help.
I've been able to go a little further with tracing what is going on that causes 'rpi_usb_host_drv' to stop working. Like I wrote earlier I believed that interrupt is not properly passed. Now I know that I was wrong and have some more details but I still don't know how to fix the problem.
I'm compiling and running `usb_hid_raw.run` and `ping.run` on rpi1, rpi2 and rpi3 (64bit). I know exactly where the processing stops but I don't know why.
Generally the problem is somewhere after enabling interrupts in `rpi_usb_host_drv` which causes log:
dev_dbg: Calling enable_global_interrupts
and expected log:
Task::run: irq_20
which in some cases does not happen.
`usb_hid_raw` scenario works similarly on all rpi variants producing generally the same following logs (they differ only in some addresses:
dev_dbg: Calling enable_global_interrupts DWC_MODIFY_REG32: 10008 dev_dbg: Done device_add(): Probe return 0 Task::block: linux Task::schedule: linux Task::run: device_worker Task::block: device_worker Task::schedule: device_worker MG_acquire: Signal_receiver::pending_signal 0xc33b4 MG_acquired: Signal_receiver::pending_signal 0xc33b4 MG_release: Signal_receiver::pending_signal 0xc33b4 MG_released: Signal_receiver::pending_signal 0xc33b4 MG_acquire: Signal_receiver::pending_signal 0xc33b4 MG_acquired: Signal_receiver::pending_signal 0xc33b4 MG_release: Signal_receiver::pending_signal 0xc33b4 MG_released: Signal_receiver::pending_signal 0xc33b4 unblock /projects/genode/genode/repos/dde_linux/src/lib/legacy/lx_kit/irq.cc:149 Task::run: irq_20 handle_irq /projects/genode/genode/repos/dde_linux/src/lib/legacy/lx_kit/irq.cc:161
So everything is ok in this area.
But when I run `ping` scenario produced logs look like below (logs are caught using tracing and contain also information about rpc):
dev_dbg: Calling enable_global_interrupts DWC_MODIFY_REG32: 20008 dev_dbg: Done device_add(): Probe return 0 +update -update Task::block: linux Task::schedule: linux Task::run: device_worker +alloc -alloc +attach -attach Task::block: device_worker Task::schedule: device_worker <signal *session +alloc -alloc +attach -attach +size -size Signal_receiver::constructor(0x1400f10c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:58 0x1400f134 Signal_receiver::constructor(1, 0x1400f130) Signal_receiver::constructor(2, 0x1400f14c) MG_acquire: Signal_receiver::Signal_receiver 0x1400f134
On Raspberry Pi 2 and 3 there are no other logs from this component.
On Raspberry Pi 3 there is however a message that will probably allow someone to explain me what is going on but unfortunately I failed to explain it by myself for quite some time. This message is:
Kernel: MMU-fault not handled ESR=0x92000035 Kernel: init -> drivers -> rpi_usb_host_drv -> ep raised unhandled MMU fault ip=0x70e80 fault-addr=0x1400c200 type=unknown
and the fault-addr is exactly an address of the mutex that is being locked. I've added two additional fields in this class (one before the mutex and one after it) to verify that I can access (and modify) their values and I can. But there is some problem with locking the mutex that causes silent hang on Raspberry Pi 2 (arm_v7a) MMU-fault on Raspberry Pi 3 (arm_v8a) and works without problems on Raspberry Pi 1 (arm_v6).
All later logs appear only on Raspberry Pi 1 where this driver seems to work.
MG_acquired: Signal_receiver::Signal_receiver 0x1400f134 MG_release: Signal_receiver::Signal_receiver 0x1400f134 MG_released: Signal_receiver::Signal_receiver 0x1400f134 +alloc_signal_source -alloc_signal_source Signal_receiver::manage(0x1400f10c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:112 0x1400f134 MG_acquire: Signal_receiver::manage 0x1400f134 MG_acquired: Signal_receiver::manage 0x1400f134 Signal_receiver::manage(0x1400f10c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:118 Signal_receiver::manage(0x1400f10c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:120 Signal_receiver::manage(0x1400f10c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:123 Signal_receiver::manage(0x1400f10c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:132 +alloc_context -alloc_context Signal_receiver::manage(0x1400f10c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:134 MG_release: Signal_receiver::manage 0x1400f134 MG_released: Signal_receiver::manage 0x1400f134 Signal_receiver::constructor(0x1400f1d4)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:58 0x1400f1fc Signal_receiver::constructor(1, 0x1400f1f8) Signal_receiver::constructor(2, 0x1400f214) MG_acquire: Signal_receiver::Signal_receiver 0x1400f1fc MG_acquired: Signal_receiver::Signal_receiver 0x1400f1fc MG_release: Signal_receiver::Signal_receiver 0x1400f1fc MG_released: Signal_receiver::Signal_receiver 0x1400f1fc +alloc_signal_source -alloc_signal_source Signal_receiver::manage(0x1400f1d4)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:112 0x1400f1fc MG_acquire: Signal_receiver::manage 0x1400f1fc MG_acquired: Signal_receiver::manage 0x1400f1fc Signal_receiver::manage(0x1400f1d4)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:118 Signal_receiver::manage(0x1400f1d4)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:120 Signal_receiver::manage(0x1400f1d4)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:123 Signal_receiver::manage(0x1400f1d4)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:132 +alloc_context -alloc_context Signal_receiver::manage(0x1400f1d4)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:134 MG_release: Signal_receiver::manage 0x1400f1fc MG_released: Signal_receiver::manage 0x1400f1fc +alloc_rpc_cap -alloc_rpc_cap Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:112 0xc33b4 MG_acquire: Signal_receiver::manage 0xc33b4 MG_acquired: Signal_receiver::manage 0xc33b4 Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:118 Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:120 Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:123 Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:132 +alloc_context -alloc_context Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:134 MG_release: Signal_receiver::manage 0xc33b4 MG_released: Signal_receiver::manage 0xc33b4 Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:112 0xc33b4 MG_acquire: Signal_receiver::manage 0xc33b4 MG_acquired: Signal_receiver::manage 0xc33b4 Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:118 Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:120 Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:123 Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:132 +alloc_context -alloc_context Signal_receiver::manage(0xc338c)/projects/genode/genode/depot/tomga/src/base-hw-rpi/2021-10-14/src/lib/base/signal_receiver.cc:134 MG_release: Signal_receiver::manage 0xc33b4 MG_released: Signal_receiver::manage 0xc33b4 +alloc_rpc_cap -alloc_rpc_cap <session *signal MG_acquire: Signal_receiver::pending_signal 0xc33b4 MG_acquired: Signal_receiver::pending_signal 0xc33b4 MG_release: Signal_receiver::pending_signal 0xc33b4 MG_released: Signal_receiver::pending_signal 0xc33b4 +elapsed_us -elapsed_us +elapsed_us -elapsed_us +elapsed_us -elapsed_us +elapsed_us -elapsed_us +elapsed_us -elapsed_us +elapsed_us -elapsed_us +trigger_once -trigger_once <signal *signal MG_acquire: Signal_receiver::pending_signal 0xc33b4 MG_acquired: Signal_receiver::pending_signal 0xc33b4 MG_release: Signal_receiver::pending_signal 0xc33b4 MG_released: Signal_receiver::pending_signal 0xc33b4 unblock /projects/genode/genode/depot/tomga/src/usb_host_drv/2021-10-14/src/lib/legacy/lx_kit/irq.cc:149 Task::run: irq_20 handle_irq /projects/genode/genode/depot/tomga/src/usb_host_drv/2021-10-14/src/lib/legacy/lx_kit/irq.cc:161
I'm really out of ideas now and I'm asking for help.
My `rpi_master_20210815` branch (with full of debugging code added) is in [1] in case someone would like to look but I mostly ask for some ideas what can explain what can cause such behavior and advice how to fix it.
Regards Tomasz Gajewski
[1] https://github.com/tomga/genode/tree/rpi_master_20210815
Hello Tomasz,
On Fri, Oct 15, 2021 at 01:06:26AM +0200, Tomasz Gajewski wrote:
On Raspberry Pi 2 and 3 there are no other logs from this component. On Raspberry Pi 3 there is however a message that will probably allow someone to explain me what is going on but unfortunately I failed to explain it by myself for quite some time. This message is: Kernel: MMU-fault not handled ESR=0x92000035 Kernel: init -> drivers -> rpi_usb_host_drv -> ep raised unhandled MMU fault ip=0x70e80 fault-addr=0x1400c200 type=unknown and the fault-addr is exactly an address of the mutex that is being locked. I've added two additional fields in this class (one before the mutex and one after it) to verify that I can access (and modify) their values and I can. But there is some problem with locking the mutex that causes silent hang on Raspberry Pi 2 (arm_v7a) MMU-fault on Raspberry Pi 3 (arm_v8a) and works without problems on Raspberry Pi 1 (arm_v6).
The MMU fault means: "unsupported exclusive or atomic access", which is propably due to the memory type the mutex is residing in. May it be that the related memory is uncached? In that case the error would make sense. Because the atomic monitoring is hardly coupled with shareability/cache settings of the corresponding memory's page flags.
Just a general remark: I know you have spent much time on these issues, and it might be frustrating to restart attempts to some extent. But maybe following our new - the original semantic more preserving - approach of porting Linux drivers might be more sustainable for you too? At least Linux semantics like: this memory is cached/uncached, cache maintainance operations, memory barriers etc. are handled exactly like in the original code by following the new DDE principles. Also first blueprint drivers exist for USB, SD-card, Ethernet, display manager etc. within the external repositories:
https://github.com/nfeske/genode-allwinner.git https://github.com/skalk/genode-imx.git
Maybe, it is more fruitful to restart at least the USB host driver with this new approach? But it's just an idea, no pressure ;-).
Best regards Stefan
Thank you Stefan for quick response.
Stefan Kalkowski stefan.kalkowski@genode-labs.com writes:
Hello Tomasz,
On Fri, Oct 15, 2021 at 01:06:26AM +0200, Tomasz Gajewski wrote:
On Raspberry Pi 2 and 3 there are no other logs from this component. On Raspberry Pi 3 there is however a message that will probably allow someone to explain me what is going on but unfortunately I failed to explain it by myself for quite some time. This message is: Kernel: MMU-fault not handled ESR=0x92000035 Kernel: init -> drivers -> rpi_usb_host_drv -> ep raised unhandled MMU fault ip=0x70e80 fault-addr=0x1400c200 type=unknown and the fault-addr is exactly an address of the mutex that is being locked. I've added two additional fields in this class (one before the mutex and one after it) to verify that I can access (and modify) their values and I can. But there is some problem with locking the mutex that causes silent hang on Raspberry Pi 2 (arm_v7a) MMU-fault on Raspberry Pi 3 (arm_v8a) and works without problems on Raspberry Pi 1 (arm_v6).
The MMU fault means: "unsupported exclusive or atomic access", which is propably due to the memory type the mutex is residing in. May it be that the related memory is uncached? In that case the error would make sense. Because the atomic monitoring is hardly coupled with shareability/cache settings of the corresponding memory's page flags.
This information is definitely important and gives me something I can trace for. Thank you.
Nevertheless I'm curious why there is no similar message when executing on arm_v7a even though error probably happens. First message is missing for this architecture (in [1] I tried to add log to similar log to one for arm_v8a). Unfortunately nothing about error is printed and that's why I initially suspected problems with interrupts delivery. Especially that new platrorm driver introduced changes in handling of interrupts.
Just a general remark: I know you have spent much time on these issues, and it might be frustrating to restart attempts to some extent. But maybe following our new - the original semantic more preserving - approach of porting Linux drivers might be more sustainable for you too? At least Linux semantics like: this memory is cached/uncached, cache maintainance operations, memory barriers etc. are handled exactly like in the original code by following the new DDE principles.
I'm well aware of this and maybe I'll make an attempt to start again using this new aproach but I wouldn't like to fight with too many changes at once. And there are many changes in this area implemented in recent months. They are all great but each require some porting and if there is too much of them at once this work becomes uncomprehensible for me in limited time I can spend on it.
On the other side as this driver was somewhat working (not really stable at that time) for rpi 1/2/3 I initially want to rebase to current release. Unfortunately introducing platform driver made some change in behavior that cause trouble. I hope I'll be able to make it work and maybe later try an upgrade. Being able to compare behavior between working and new implementations definitely would help with troubleshooting.
[1] https://github.com/tomga/genode/blob/8aa1a006d33de38b879c0acced7231f9513273f...
Regards Tomasz Gajewski