memcpy_cpu on 64bit arm

Michael Grunditz michael.grunditz at gmail.com
Fri Jan 20 20:15:12 CET 2023


> There is no particular reason why the implementation is (i.e. "was")
> empty. You can find a recent commit on the staging branch that applies a
> few obvious optimisations to all architectures, though:
> https://github.com/genodelabs/genode/commit/4d06661d7c3f7b798ec8228f04983bd4ae7cddcf
>
Is there any content in the commit?

> For 32bit arm, I optimised the memcpy_cpu implementation a while ago
> (see Issue #4456). Interestingly, I could not see any improvements when
> using neon, at least on arm v7. I got the impression that the
> instruction density is not an issue when using the multi-word
> load/store (ldm/stm).

Ok. I think that it makes more sense now. But yes for v7 it might not
help.The only
way , from my experience, neon could be used effectively is a
test/copy routine for
all sizes. Most arm/arm64 libc do that.

I would like to try this. But I have no Idea where to put the .s file
in order to build it.
I don't want to have it inline since it is quite big..

Thanks,

Michael



More information about the users mailing list