Hello,
Is there any particular reason why it is empty? My rect copy to fb in riscos uses neon. It is a speed gain of about 40% compared to word/long word copy from c. But I don't know how much it affects Genode.
It seems like it ends up in /* eight bytes chunks */ but isn't that a byte copy?
I have the feeling that in my case, with blit,mem copy without crop is faster than doing it with crop every time. Ram, accesses are proved to be very fast on this hardware. The extra cycles for cropping might be a bottleneck. Combined with a neon run through , testing and copying without leaving routine could probably make a big performance gain.
But.
I am happy, as always, to be proved wrong!
Michael
Hi Michael,
On Fri, 20 Jan 2023 16:40:10 +0100 Michael Grunditz michael.grunditz@gmail.com wrote:
Hello,
Is there any particular reason why it is empty? My rect copy to fb in riscos uses neon. It is a speed gain of about 40% compared to word/long word copy from c. But I don't know how much it affects Genode.
It seems like it ends up in /* eight bytes chunks */ but isn't that a byte copy?
There is no particular reason why the implementation is (i.e. "was") empty. You can find a recent commit on the staging branch that applies a few obvious optimisations to all architectures, though: https://github.com/genodelabs/genode/commit/4d06661d7c3f7b798ec8228f04983bd4...
For 32bit arm, I optimised the memcpy_cpu implementation a while ago (see Issue #4456). Interestingly, I could not see any improvements when using neon, at least on arm v7. I got the impression that the instruction density is not an issue when using the multi-word load/store (ldm/stm).
Johannes
There is no particular reason why the implementation is (i.e. "was") empty. You can find a recent commit on the staging branch that applies a few obvious optimisations to all architectures, though: https://github.com/genodelabs/genode/commit/4d06661d7c3f7b798ec8228f04983bd4...
Is there any content in the commit?
For 32bit arm, I optimised the memcpy_cpu implementation a while ago (see Issue #4456). Interestingly, I could not see any improvements when using neon, at least on arm v7. I got the impression that the instruction density is not an issue when using the multi-word load/store (ldm/stm).
Ok. I think that it makes more sense now. But yes for v7 it might not help.The only way , from my experience, neon could be used effectively is a test/copy routine for all sizes. Most arm/arm64 libc do that.
I would like to try this. But I have no Idea where to put the .s file in order to build it. I don't want to have it inline since it is quite big..
Thanks,
Michael
On Fri, 20 Jan 2023 at 20:15, Michael Grunditz michael.grunditz@gmail.com wrote:
There is no particular reason why the implementation is (i.e. "was") empty. You can find a recent commit on the staging branch that applies a few obvious optimisations to all architectures, though: https://github.com/genodelabs/genode/commit/4d06661d7c3f7b798ec8228f04983bd4...
Is there any content in the commit?
For 32bit arm, I optimised the memcpy_cpu implementation a while ago (see Issue #4456). Interestingly, I could not see any improvements when using neon, at least on arm v7. I got the impression that the instruction density is not an issue when using the multi-word load/store (ldm/stm).
Ok. I think that it makes more sense now. But yes for v7 it might not help.The only way , from my experience, neon could be used effectively is a test/copy routine for all sizes. Most arm/arm64 libc do that.
I would like to try this. But I have no Idea where to put the .s file in order to build it. I don't want to have it inline since it is quite big..
I have something that seems to work, even though I get a crash from test-log that I haven't solved. It could be because the .S file is built in every component. So the question is, where do I put it!?! It needs to live in "base", I guess. But I don't know how. The rest of the system doesn't resolve the symbol.
/Michael
I have something that seems to work, even though I get a crash from test-log that I haven't solved. It could be because the .S file is built in every component. So the question is, where do I put it!?! It needs to live in "base", I guess. But I don't know how. The rest of the system doesn't resolve the symbol.
I added to base.mk .. seems ok , but it should be in a platform file :-) Anyway init crashes somewhere.
2043 MiB RAM and 64533 caps assigned to init no RM attachment (READ pf_addr=0x81 pf_ip=0x9f3f4 from pager_object: pd='init' thread='init') Warning: page fault, pager_object: pd='init' thread='init' ip=0x9f3f4 fault-addr=0x81 type=no-page
/Michael
On Mon, 23 Jan 2023 at 14:15, Michael Grunditz michael.grunditz@gmail.com wrote:
I have something that seems to work, even though I get a crash from test-log that I haven't solved. It could be because the .S file is built in every component. So the question is, where do I put it!?! It needs to live in "base", I guess. But I don't know how. The rest of the system doesn't resolve the symbol.
I added to base.mk .. seems ok , but it should be in a platform file :-) Anyway init crashes somewhere.
2043 MiB RAM and 64533 caps assigned to init no RM attachment (READ pf_addr=0x81 pf_ip=0x9f3f4 from pager_object: pd='init' thread='init') Warning: page fault, pager_object: pd='init' thread='init' ip=0x9f3f4 fault-addr=0x81 type=no-page
I *think* it fails right in start of first child. It seems ok around the starting. The last memcpy seems ok. I have printed both buffers and they are identical.
/Michael
Hello,
On Mon, Jan 23, 2023 at 15:56:48 CET, Michael Grunditz wrote:
On Mon, 23 Jan 2023 at 14:15, Michael Grunditz michael.grunditz@gmail.com wrote:
Anyway init crashes somewhere.
2043 MiB RAM and 64533 caps assigned to init no RM attachment (READ pf_addr=0x81 pf_ip=0x9f3f4 from pager_object: pd='init' thread='init') Warning: page fault, pager_object: pd='init' thread='init' ip=0x9f3f4 fault-addr=0x81 type=no-page
I *think* it fails right in start of first child. It seems ok around the starting. The last memcpy seems ok. I have printed both buffers and they are identical.
I'm wondering why you replaced the general memcpy() in the first place as this brings maximal (potential negative) impact in many places. You could alternatively just optimize your framebuffer blitting and proceed with your porting work.
Also, it is not known which NEON instructions and registers your memcpy utilizes. I'm not an ARM expert but: Does the current FPU-switching implementation of base-hw suffice or do you need to save/restore extended state? As you do not share your developments (i.e. source code) with the public it is hard to provide specific help.
Any information which code is at 0x9f3f4 in ld.lib.so?
Greets
On Tue, 24 Jan 2023 at 08:21, Christian Helmuth christian.helmuth@genode-labs.com wrote:
Hello,
On Mon, Jan 23, 2023 at 15:56:48 CET, Michael Grunditz wrote:
On Mon, 23 Jan 2023 at 14:15, Michael Grunditz michael.grunditz@gmail.com wrote:
Anyway init crashes somewhere.
2043 MiB RAM and 64533 caps assigned to init no RM attachment (READ pf_addr=0x81 pf_ip=0x9f3f4 from pager_object: pd='init' thread='init') Warning: page fault, pager_object: pd='init' thread='init' ip=0x9f3f4 fault-addr=0x81 type=no-page
I *think* it fails right in start of first child. It seems ok around the starting. The last memcpy seems ok. I have printed both buffers and they are identical.
I'm wondering why you replaced the general memcpy() in the first place as this brings maximal (potential negative) impact in many places. You could alternatively just optimize your framebuffer blitting and proceed with your porting work.
Yes you are absolutely right. I just got a bit excited about what might be gained.
Also, it is not known which NEON instructions and registers your memcpy utilizes. I'm not an ARM expert but: Does the current FPU-switching implementation of base-hw suffice or do you need to save/restore extended state? As you do not share your developments (i.e. source code) with the public it is hard to provide specific help.
Right again.
Any information which code is at 0x9f3f4 in ld.lib.so?
No.Sorry. At the top level it crashes at the first elf starting, init. It seems to be able to start the thread but fails in the jump ( or something ).
Anyway I will stop doing this little hack. Sorry for sending unwanted email and thanks for reading!
About the blitting: I will do benchmarks with and without clipping/flushing.
Also: I want to put things together in a target tree and publish that on github and at the same time rebase everything to git level. I think this is the task I will do first. Things are scattered across several repos now, and it is quite hard to handle. I also want to be open with what I am doing.
Thanks for you patience,
Michae
Hi Michael,
About the blitting: I will do benchmarks with and without clipping/flushing.
for such microbenchmarking, let me point you to the GENODE_LOG_TSC utility [1], which makes this dead easy.
[1] https://genodians.org/nfeske/2021-04-07-performance
Cheers Norman
On Tue, 24 Jan 2023 at 09:48, Norman Feske norman.feske@genode-labs.com wrote:
Hi Michael,
About the blitting: I will do benchmarks with and without clipping/flushing.
for such microbenchmarking, let me point you to the GENODE_LOG_TSC utility [1], which makes this dead easy.
Thanks I will check it out!
I have rebased my code to git level and into a separate tree now. Can I have the start file , crt0.s , in my tree? I still need register clearing.
I will put in on github asap. I have based it on Alllwinner , so I need to remove the allwinner file and (not exactly necessary for first commit) change header ifdefs.
Michael
I have rebased my code to git level and into a separate tree now. Can I have the start file , crt0.s , in my tree? I still need register clearing.
I will put in on github asap. I have based it on Alllwinner , so I need to remove the allwinner file and (not exactly necessary for first commit) change header ifdefs.
A little issue. My version of usb driver relies on headers in dde_linux. Is there a way to add "dde_linux/src/include/spec/arm_64" to my target.mk as incdir? I can hardcode it ,, but naturally it needs to be dynamic.