Hi,
last weeks of attempts gave me some small progress with code, understanding of some aspects of ARM (AArch32) architecture and some knowledge about how low level hw kernel is implemented.
I have some thoughts and questions about how some issues should/could be implemented that I'd like to discuss.
Firstly I think that in my ignorance I took a wrong path to experiment with Raspberry Pi 3B+. It has a Cortex A53 (ARMv8-A) processor but as I want to target AArch32 (and initially without SMP) I thought that modifying current rpi target is the best choice. Now I think that I should have started with the newest supported by hw ARM processor (Cortex A15 if I correctly understand) and I would not be stopped by issues I had. Or maybe not...
When last time I wrote about my progress I couldn't pass through following code:
sctlr = Cpu::Sctlr::read(); Cpu::Sctlr::C::set(sctlr, 1); // enable data cache Cpu::Sctlr::I::set(sctlr, 1); // enable instruction cache Cpu::Sctlr::M::set(sctlr, 1); // enable mmu Cpu::Sctlr::write(sctlr);
After reading and making different attempts I checked that enabling instruction cache does not cause problems, enabling data cache or mmu individually does allow to go further but enabling both does not. But I thought that for initial experiments I can live without data cache. But soon I found out that I was wrong.
After many attempts I found a place where it halted. It was deep inside Genode::log in ... assembly in atomic.h in cmpxchg. And reading gave me another unpleasent surprise that it will not work if I don't have data cache enabled... So I had to go back.
After two weeks of poking around finally I found a workaround that allowed me to go further. I've changed Page_flags for all types from CACHED to UNCACHED and it allowed to pass through enabling mmu with data cache enabled (of course without real caching) and through spinlock in atomic.h.
Next thing was to make UART working. Unfortunately version 3 has different UART device enabled by default (it has two) and it required a different driver which I implemented.
Seeing "kernel initialized" passed through serial connection was a real pleasure after analyzing logs in memory for two weeks.
So I thought that it is a good moment to write my thoughts and questions.
For writing traces to memory I've created a simple utility (set of macros in assembly and C++) to write simple debug values to a buffer in memory. I used it to diagnose what is going on before serial connection is working.
It is specified by a buffer address and size. At the beginning a pointer to current position in buffer is stored. A 1-1 mapping is inserted into TLB for this region and I had to make changes in virtual memory layout of core. Currently addresses are hardcoded but I can polish it to a state where it could be enabled/disabled with some build option or some defines in one place in the code if you'd like to use this. I pushed current state of my work [1] (without any cleanup yet). Utility macros are in: - repos/base/include/base/memtrace.h - repos/base-hw/src/bootstrap/spec/arm/crt0.s Would you be interested in adding something like this to Genode? I would create issue and propose some version.
I have some general doubts about how some issues could be resolved. I started experimenting with an idea to make it possible to have a Sculpt for Raspberry Pi. And I thought it could be created one in such a way that single binary could run on any Pi. I knew that they are all ARM devices and even though some are 64bit they can work in AArch32 so I wanted to treat all version as just 32bit ARM boards.
I knew that they differ in list of devices so I thought that some kind of configuration would have to be passed during booting process so proper device drivers could be loaded and with proper configuration. I checked that a solution for this problem (using one binary for different ARM boards) used by at least Linux and U-boot is device tree. My plan was (and still is unless something better is proposed) to try to experiment with incorporating support for device tree to a platform driver - if I correctly understand it is a place where similar functionality is implemented for x86_64.
Browsing through bootstrap and core code in base-hw for past weeks made me realise that to support one binary for different Pi devices which have different generations of processors other problems would have to be resolved.
If I correctly understand currently whe building base-hw for a specific ARM board all configuration is provided during build process. It is performed by:
a. specifying target processor version (-m for g++)
b. specifying list of files to compile and include to binary that contain implementations of functions that differ between different processor generations and to enable some functionality (e.g. for virtualization support for ARM)
c. constants in code - to provide proper memory ranges, MMIO addresses, etc. for device drivers
Having all this information during build allows to:
- optimize code by inlining even processor specific code
- minimize resulting binary (maybe not so important) and therefore memory footprint on runnig system (more importand)
On the other hand wikipedia (here [2]) currently mentions 15 models of Raspberry Pi which differ in:
- processor generations (ARM11, Cortex-A7, Cortex-A53) - that breaks point (a)
- have different devices (with/without ethernet, wireless, etc.) - that breaks point (b)
- have different runtime configurations that can be changed by configuring firmware (e.g. different partitioning of RAM for used by graphics and operating system can be performed using a configuration option) - that contradicts with (c)
When thinking about supporting different Pi devices (and more generally other ARM boards) I think that there are two areas to consider:
A. support for runtime/startup configuration of devices that will have drivers running in different processes - this is a part that I knew about from the beginning of my experimentation. I still think that implementing support for device tree (or some alternative) is a proper solution for this part (what whould be a method of passing configuration to device drivers is an open question)
B. support for passing configuration that is required by bootstrap and core. Here drivers are selected with C++ code e.g.:
//using Serial = Genode::Pl011_uart; using Serial = Genode::Mini_uart;
Selecting UART driver is an interesting example as RPI3B+ has two UART devices and which one is available is decided by using proper configuration for firmware (mini uart is available by default and Pl011 uart is used internally for communcation with Bluetooth but it can be changed with configuration before starting operating system).
Now questions:
1. Do you (Genode) plan to have or would like to have such configurable support for different ARM devices?
2. Do you think that creating generic platform driver for ARM (to support A) that work using informations provided by device tree support is generally a good idea? What alternatives you consider better?
3. Do you see any way to implement suport for B?
I'd very much like to receive some comments about it.