memcpy_cpu on 64bit arm

Johannes Schlatow johannes.schlatow at genode-labs.com
Fri Jan 20 17:05:40 CET 2023


Hi Michael,

On Fri, 20 Jan 2023 16:40:10 +0100
Michael Grunditz <michael.grunditz at gmail.com> wrote:

> Hello,
> 
> Is there any particular reason why it is empty?
> My rect copy to fb in riscos uses neon. It is
> a speed gain of about 40% compared to word/long word
> copy from c. But I don't know how much it affects Genode.
> 
> It seems like it ends up in /* eight bytes chunks */ but isn't that a
> byte copy?
> 

There is no particular reason why the implementation is (i.e. "was")
empty. You can find a recent commit on the staging branch that applies a
few obvious optimisations to all architectures, though:
https://github.com/genodelabs/genode/commit/4d06661d7c3f7b798ec8228f04983bd4ae7cddcf

For 32bit arm, I optimised the memcpy_cpu implementation a while ago
(see Issue #4456). Interestingly, I could not see any improvements when
using neon, at least on arm v7. I got the impression that the
instruction density is not an issue when using the multi-word
load/store (ldm/stm).

Johannes



More information about the users mailing list