memcpy_cpu on 64bit arm
Johannes Schlatow
johannes.schlatow at genode-labs.com
Fri Jan 20 17:05:40 CET 2023
Hi Michael,
On Fri, 20 Jan 2023 16:40:10 +0100
Michael Grunditz <michael.grunditz at gmail.com> wrote:
> Hello,
>
> Is there any particular reason why it is empty?
> My rect copy to fb in riscos uses neon. It is
> a speed gain of about 40% compared to word/long word
> copy from c. But I don't know how much it affects Genode.
>
> It seems like it ends up in /* eight bytes chunks */ but isn't that a
> byte copy?
>
There is no particular reason why the implementation is (i.e. "was")
empty. You can find a recent commit on the staging branch that applies a
few obvious optimisations to all architectures, though:
https://github.com/genodelabs/genode/commit/4d06661d7c3f7b798ec8228f04983bd4ae7cddcf
For 32bit arm, I optimised the memcpy_cpu implementation a while ago
(see Issue #4456). Interestingly, I could not see any improvements when
using neon, at least on arm v7. I got the impression that the
instruction density is not an issue when using the multi-word
load/store (ldm/stm).
Johannes
More information about the users
mailing list