Hi,
during my attempts to make framebuffer driver working on rpi3b+ I had problems with memory caching. Effect was that I received from gpu information that result message was processed but couldn't see results - read memory always contained content of message and not response.
Before I found a simple fix - ask dataspace for an uncached memory - I tried to call 'Kernel::update_data_region' but it didn't work for me. During my experiments I tried to dump memory from 'Kernel::Thread::_call_update_data_region()' for virtual address range passed to this function and it worked.
In that function there is a comment that states that kernel operates in different address space than the caller for threads other than core. In that case different method of cache invalidation was called: 'clean_invalidate_data_cache', that should invalidate all cache. For core threads 'clean_invalidate_data_cache_by_virt_region' was used.
Successfully dumping memory passed to _call_update_data_region() (using virtual address) made me think that a comment there about address spaces is wrong and - as an attempt to see what happens - I changed code to always call invalidating memory for given region only. This change [1] resolved issues with not seeing responses from gpu regarding framebuffer setup.
If I'm not mistaken then:
1. Comment in 'Kernel::Thread::_call_update_data_region()' is not true anymore and code can be changed to always call more efficient version that invalidates always only cache for region of memory.
2. 'clean_invalidate_data_cache' is not working on rpi3b+ properly. I wouldn't be really surprised as I used code from Cortex A15, available already in Genode, without checking if Cortex A53 requires a different code (even for AArch32 mode).
Please give me information if kernel is now mapped into every address space (as is stated in the aforementioned comment as a future goal) and my change is a correct one. If it is not the case can you provide some other possible explanation of this?
Tomasz Gajewski
PS. As a minor addtion I found a trivial function documentation bug that I fixed in [2].
[1] https://github.com/tomga/genode/commit/5a5adc06d9889129f2f1efd9a36be0ef52bb2... [2] https://github.com/tomga/genode/commit/f25fad3c2957b843b549665cd7bcb50bec1d1...
Hello Tomasz,
On Sun, Mar 03, 2019 at 01:19:06AM +0100, Tomasz Gajewski wrote:
Hi,
during my attempts to make framebuffer driver working on rpi3b+ I had problems with memory caching. Effect was that I received from gpu information that result message was processed but couldn't see results - read memory always contained content of message and not response.
Before I found a simple fix - ask dataspace for an uncached memory - I tried to call 'Kernel::update_data_region' but it didn't work for me. During my experiments I tried to dump memory from 'Kernel::Thread::_call_update_data_region()' for virtual address range passed to this function and it worked.
In that function there is a comment that states that kernel operates in different address space than the caller for threads other than core. In that case different method of cache invalidation was called: 'clean_invalidate_data_cache', that should invalidate all cache. For core threads 'clean_invalidate_data_cache_by_virt_region' was used.
Successfully dumping memory passed to _call_update_data_region() (using virtual address) made me think that a comment there about address spaces is wrong and - as an attempt to see what happens - I changed code to always call invalidating memory for given region only. This change [1] resolved issues with not seeing responses from gpu regarding framebuffer setup.
If I'm not mistaken then:
- Comment in 'Kernel::Thread::_call_update_data_region()' is not true anymore and code can be changed to always call more efficient version that invalidates always only cache for region of memory.
You are right, that comment and branch is an artefact from the time, where the kernel used a separate address space. It should always call the fewer cache line invalidation instead of invalidating the whole cache. To my excuse, those functions are actually not used by any components. Actually, the only use-case is the Javascript JIT compiler of Arora I know of. But anyway, it needs to be fixed.
- 'clean_invalidate_data_cache' is not working on rpi3b+ properly. I wouldn't be really surprised as I used code from Cortex A15, available already in Genode, without checking if Cortex A53 requires a different code (even for AArch32 mode).
Well, this leaves me puzzled, I would have assumed that at least the overall cross-core cache invalidation should work here too. Actually, there is not much differences in between different Armv7 cores and the data-cache clean/invalidate operations apart from the special outer L2 cache of Cortex A9 cpus. Maybe the cross-core cache coherency is not setup appropriatedly? to me it is not quite clear how smp setup is done on Cortex A53. When looking into the manual, it seems to me that ACTLR register is differently implemented, but I'm not sure whether I understood it correctly. In Cortex A9 / A15 cpus, you had to enable coherency units using the smb bit in that ACTLR register before enabling the MMU in multi-core environments.
Regards Stefan
Please give me information if kernel is now mapped into every address space (as is stated in the aforementioned comment as a future goal) and my change is a correct one. If it is not the case can you provide some other possible explanation of this?
Tomasz Gajewski
PS. As a minor addtion I found a trivial function documentation bug that I fixed in [2].
[1] https://github.com/tomga/genode/commit/5a5adc06d9889129f2f1efd9a36be0ef52bb2... [2] https://github.com/tomga/genode/commit/f25fad3c2957b843b549665cd7bcb50bec1d1...
Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users
Hi Tomasz,
On Mon, Mar 04, 2019 at 12:19:22PM +0100, Stefan Kalkowski wrote:
Hello Tomasz,
On Sun, Mar 03, 2019 at 01:19:06AM +0100, Tomasz Gajewski wrote:
Hi,
during my attempts to make framebuffer driver working on rpi3b+ I had problems with memory caching. Effect was that I received from gpu information that result message was processed but couldn't see results - read memory always contained content of message and not response.
Before I found a simple fix - ask dataspace for an uncached memory - I tried to call 'Kernel::update_data_region' but it didn't work for me. During my experiments I tried to dump memory from 'Kernel::Thread::_call_update_data_region()' for virtual address range passed to this function and it worked.
In that function there is a comment that states that kernel operates in different address space than the caller for threads other than core. In that case different method of cache invalidation was called: 'clean_invalidate_data_cache', that should invalidate all cache. For core threads 'clean_invalidate_data_cache_by_virt_region' was used.
Successfully dumping memory passed to _call_update_data_region() (using virtual address) made me think that a comment there about address spaces is wrong and - as an attempt to see what happens - I changed code to always call invalidating memory for given region only. This change [1] resolved issues with not seeing responses from gpu regarding framebuffer setup.
If I'm not mistaken then:
- Comment in 'Kernel::Thread::_call_update_data_region()' is not true anymore and code can be changed to always call more efficient version that invalidates always only cache for region of memory.
You are right, that comment and branch is an artefact from the time, where the kernel used a separate address space. It should always call the fewer cache line invalidation instead of invalidating the whole cache. To my excuse, those functions are actually not used by any components. Actually, the only use-case is the Javascript JIT compiler of Arora I know of. But anyway, it needs to be fixed.
Now, I've revised that part. You might have a look at:
https://github.com/genodelabs/genode/issues/2699
Thanks again for your notice.
Regards Stefan
- 'clean_invalidate_data_cache' is not working on rpi3b+ properly. I wouldn't be really surprised as I used code from Cortex A15, available already in Genode, without checking if Cortex A53 requires a different code (even for AArch32 mode).
Well, this leaves me puzzled, I would have assumed that at least the overall cross-core cache invalidation should work here too. Actually, there is not much differences in between different Armv7 cores and the data-cache clean/invalidate operations apart from the special outer L2 cache of Cortex A9 cpus. Maybe the cross-core cache coherency is not setup appropriatedly? to me it is not quite clear how smp setup is done on Cortex A53. When looking into the manual, it seems to me that ACTLR register is differently implemented, but I'm not sure whether I understood it correctly. In Cortex A9 / A15 cpus, you had to enable coherency units using the smb bit in that ACTLR register before enabling the MMU in multi-core environments.
Regards Stefan
Please give me information if kernel is now mapped into every address space (as is stated in the aforementioned comment as a future goal) and my change is a correct one. If it is not the case can you provide some other possible explanation of this?
Tomasz Gajewski
PS. As a minor addtion I found a trivial function documentation bug that I fixed in [2].
[1] https://github.com/tomga/genode/commit/5a5adc06d9889129f2f1efd9a36be0ef52bb2... [2] https://github.com/tomga/genode/commit/f25fad3c2957b843b549665cd7bcb50bec1d1...
Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users
-- Stefan Kalkowski Genode labs
https://github.com.skalk | https://genode.org
Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users