Re: Genode on RPI

29 Jan 2019

      Hello Tomasz,
thank you very much for sharing your experimentation efforts and
insights and for starting the discussion. Please, see my comments
inline below.
On Mon, Jan 28, 2019 at 06:31:33PM +0100, Tomasz Gajewski wrote:
...
Hi,
last weeks of attempts gave me some small progress with code,
understanding of some aspects of ARM (AArch32) architecture and some
knowledge about how low level hw kernel is implemented.
I have some thoughts and questions about how some issues should/could be
implemented that I'd like to discuss.
Firstly I think that in my ignorance I took a wrong path to experiment
with Raspberry Pi 3B+. It has a Cortex A53 (ARMv8-A) processor but as I
want to target AArch32 (and initially without SMP) I thought that
modifying current rpi target is the best choice. Now I think that I
should have started with the newest supported by hw ARM processor
(Cortex A15 if I correctly understand) and I would not be stopped
by issues I had. Or maybe not...
Indeed, starting with a fork of the most recent processor in hw as a
starting point might be better.
...
When last time I wrote about my progress I couldn't pass through
following code:
...
sctlr = Cpu::Sctlr::read();
Cpu::Sctlr::C::set(sctlr, 1);   // enable data cache
Cpu::Sctlr::I::set(sctlr, 1);   // enable instruction cache
Cpu::Sctlr::M::set(sctlr, 1);   // enable mmu
Cpu::Sctlr::write(sctlr);

After reading and making different attempts I checked that enabling
instruction cache does not cause problems, enabling data cache or mmu
individually does allow to go further but enabling both does not. But I
thought that for initial experiments I can live without data cache. But
soon I found out that I was wrong.
After many attempts I found a place where it halted. It was deep inside
Genode::log in ... assembly in atomic.h in cmpxchg. And reading gave me
another unpleasent surprise that it will not work if I don't have data
cache enabled... So I had to go back.
Yes, we are using ldrex/strex especially to support multi-processor
systems. Those instructions use cache-coherency mechanisms of the
processors when working on "cacheable" memory. I you did not turn on
caches and the snoop-control-unit (SMP bit in Armv7) the cmpxchg won't
work properly.
...
After two weeks of poking around finally I found a workaround that
allowed me to go further. I've changed Page_flags for all types from
CACHED to UNCACHED and it allowed to pass through enabling mmu with
data cache enabled (of course without real caching) and through spinlock
in atomic.h.
Next thing was to make UART working. Unfortunately version 3 has
different UART device enabled by default (it has two) and it required a
different driver which I implemented.
Seeing "kernel initialized" passed through serial connection was a real
pleasure after analyzing logs in memory for two weeks.
So I thought that it is a good moment to write my thoughts and
questions.
For writing traces to memory I've created a simple utility (set of
macros in assembly and C++) to write simple debug values to a buffer in
memory. I used it to diagnose what is going on before serial connection
is working.
It is specified by a buffer address and size. At the beginning a pointer
to current position in buffer is stored. A 1-1 mapping is inserted into
TLB for this region and I had to make changes in virtual memory layout
of core. Currently addresses are hardcoded but I can polish it to a
state where it could be enabled/disabled with some build option or some
defines in one place in the code if you'd like to use this. I pushed
current state of my work [1] (without any cleanup yet). Utility macros
are in:

repos/base/include/base/memtrace.h
repos/base-hw/src/bootstrap/spec/arm/crt0.s

Would you be interested in adding something like this to Genode? I would
create issue and propose some version.
Very cool that you could help yourself that way!
In general, we already have a tracing mechanism in Genode, where
tracing points are collected in memory, and might get aggregated even
from many different components for debug or optimization reasons.
Anyway, that mechanism is based on core's functionality and therefore
only appropriated for components running on top of core/kernel.
Another option used in NOVA/Genode is to write all kernel messages
into a memory buffer and export it to the userland. Thereby, you can
aggregate messages in headless systems or systems without AMT or
serial line. I think this way might be appropriated to be used in
base-hw too. Anyway, I would not introduce new debug macros, but
just add a simple serial driver, which isn't using a special device
but a portion of memory for printing. Then one can switch UART to that
model if there is no one, or it is troublesome. What do you think
about that approach?
I have to admit that usually I can use a JTAG debugger on most
platforms to tackle such early initialization problems. On the other
hand I'm quite cautious in adding pure debug feature in the
"microkernel" ;-).
...
I have some general doubts about how some issues could be resolved. I
started experimenting with an idea to make it possible to have a Sculpt
for Raspberry Pi. And I thought it could be created one in such a way
that single binary could run on any Pi. I knew that they are all ARM
devices and even though some are 64bit they can work in AArch32 so I
wanted to treat all version as just 32bit ARM boards.
I knew that they differ in list of devices so I thought that some kind
of configuration would have to be passed during booting process so
proper device drivers could be loaded and with proper configuration. I
checked that a solution for this problem (using one binary for different
ARM boards) used by at least Linux and U-boot is device tree. My plan
was (and still is unless something better is proposed) to try to
experiment with incorporating support for device tree to a platform
driver - if I correctly understand it is a place where similar
functionality is implemented for x86_64.
Browsing through bootstrap and core code in base-hw for past weeks made
me realise that to support one binary for different Pi devices which
have different generations of processors other problems would have to be
resolved.
If I correctly understand currently whe building base-hw for a specific
ARM board all configuration is provided during build process. It is
performed by:
a. specifying target processor version (-m for g++)
b. specifying list of files to compile and include to binary that
    contain implementations of functions that differ between different
    processor generations and to enable some functionality (e.g. for
    virtualization support for ARM)
c. constants in code - to provide proper memory ranges, MMIO addresses,
    etc. for device drivers
Don't get me wrong, striving for the stars is welcome, but I think the
goal of having a generic kernel image for all RPIs or even all ARM
devices is a bit too ambitious right now.
Although, Linux ARM developers are working on it since many years,
they aren't there yet. Look at current RPI distro images like Raspian,
they deliver at least two different kernels _and_ multiple device
trees. The bootloader code has to decide, which one is loaded.
Currently, there is no generic convention followed by all SoC/board
vendors, bootloaders etc.. We have to be pragmatic here.
Given the current state of e.g. the Rasperry Pi universe, where you
have to decide what needs to be loaded, I do not see in general the
advantage to differentiate in between configuration data and some
small, statically linked image.
...
Having all this information during build allows to:

optimize code by inlining even processor specific code

minimize resulting binary (maybe not so important) and therefore
memory footprint on runnig system (more importand)

On the other hand wikipedia (here [2]) currently mentions 15 models of
Raspberry Pi which differ in:

processor generations (ARM11, Cortex-A7, Cortex-A53) - that breaks
point (a)

have different devices (with/without ethernet, wireless, etc.) -
that breaks point (b)

have different runtime configurations that can be changed by
configuring firmware (e.g. different partitioning of RAM for used by
graphics and operating system can be performed using a configuration
option) - that contradicts with (c)

Surely, those are too much dimensions to provide different fully
fledged system images. Also, we do not want too much platform
differentiations in the codebase. Again, I vote for pragmaticsm and to
target the different platforms step-by-step and not all-in-one.
Currently, I would think that having different targets for the
different architectures (Armv6, Armv7, Armv8) is a good starting
point. It is a natural boundary, and we can compile all components to
target the correct one.
Then the question is whether there is a mechanism in the
Broadcom SoC to identify the correct model at runtime, e.g. some
identification registers. If possible, we can of course hide the model
differences in the platform/bootstrap code within one image.
I would omit different runtime configurations of the firmware for the
moment and just support the default one.
...
When thinking about supporting different Pi devices (and more generally
other ARM boards) I think that there are two areas to consider:
A. support for runtime/startup configuration of devices that will have
    drivers running in different processes - this is a part that I knew
    about from the beginning of my experimentation. I still think that
    implementing support for device tree (or some alternative) is a
    proper solution for this part (what whould be a method of passing
    configuration to device drivers is an open question)
B. support for passing configuration that is required by bootstrap and
    core. Here drivers are selected with C++ code e.g.:
    //using Serial   = Genode::Pl011_uart;
    using Serial   = Genode::Mini_uart;

Selecting UART driver is an interesting example as RPI3B+ has two
UART devices and which one is available is decided by using proper
configuration for firmware (mini uart is available by default and
Pl011 uart is used internally for communcation with Bluetooth but it
can be changed with configuration before starting operating system).

Now questions:

Do you (Genode) plan to have or would like to have such configurable
support for different ARM devices?

I only state my point of view that does not necessarily be the way it
will go, but anyway: nowadays, configuration starts above core/kernel.
We should keep this for the time-being to limit the risk of
over-engineering in this minimal, critical code-base. I can imagine, that
we weaken this claim for the bootstrap component in the future,
because that code is critical in the boot-stage only, but then gets
thrown away anyway.
Above core, we need to know:
* What devices exists
* What resources do they need
* Are there additional configuration aspects needed by the driver,
  beyond the resources, e.g.: operation mode of the PHY interface for
  ethernet devices ...
* Resp. how to power/clock the device at runtime if dynamic power
  consumption is a topic
...

Do you think that creating generic platform driver for ARM (to
support A) that work using informations provided by device tree
support is generally a good idea? What alternatives you consider
better?

All functionality described above is realized using open firmware /
flattened device tree in Linux and Co. I agree that it is appealing to
be able to access all the platform data for free. But this only half
the truth for the following reasons:
# Device tree support was a moving target within the last years
  with lots of changes in the implementation and in the trees
  themselfs, it is not forseeable that this will change
# There are reams of device-specific attributes. Please, have a look
  at Documentation/devicetree in the Linux kernel. There are over
  3000 files. The whole "of_" bureaucracy in the Linux kernel
  shows its complexity.
# Different device trees and its "generic" language should not suggest
  that we will have one platform driver to rule them all. There are
  tons of "platform devices" in the Linux kernel necessary to provide
  the resources that toplevel devices need. That means for each SoC
  you still need a specific platform driver that support all the
  multiplexing units for pins, power, clocking etc.
# Device trees describe also a lot of relations in between devices.
  For instance, a device might get its interrupts from the interrupt
  controller or from a GPIO controller. All routes to some kind of
  multiplexer device are part of the tree. It is much easier if you
  want to apply this configuration to a monolithic component containing
  all multiplexer devices than to split this in multiple components.
To sum it up. I'm not sure whether it is the right way to go to
support device trees by Genode's ARM platform driver, but I do not
want to exclude it completely. Maybe I'm wrong and the complexity of
the core functionality is over-estimated by me.
Surely we need something comparable at some point. Maybe, it would be
good to start it as an experiment for one specific SoC.
...

Do you see any way to implement suport for B?

Surely, but at least here at Genode Labs it is not highly prioritized
right now. Currently, we try to limit the platform support and care to
the NXP i.MX* platforms. Here, we plan to support i.MX8 as soon as
possible, which has Cortex A53 too.
...
I'd very much like to receive some comments about it.
Than you very much for sharing your thoughts.
Best regards
Stefan
...
-- 
Tomasz Gajewski
[1] https://github.com/tomga/genode/tree/rpi3bplus
[2] https://en.wikipedia.org/wiki/Raspberry_Pi#Specifications

Genode users mailing list
users@lists.genode.org
https://lists.genode.org/listinfo/users
-- 
Stefan Kalkowski
Genode labs

https://github.com.skalk | https://genode.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: Genode on RPI