Hi all, I'm exploring compiling a couple of targets in my Genode, base-hw system to use the hardware fp in the Neon coprocessor on an ARM Cortex A8 SOC, to get some performance improvement. I can add the CC_OPT line containing -mfloat-abi=hard and -mfu=neon-vfpv4 to appropreate target.mk. That should result in the Neon fp unit being called to perform the calculations. If I want use libc math functions I assume I must recompile libc with the same compile options added. Are there build variables that allow me to do that to the libc library?
I'm currently building on Genode 15.02.
Thanks for any info.
Bob Stewart
Sent from my android device.
Hello Bob,
On Wed, Sep 30, 2015 at 03:14:26PM -0400, robjsstewart@...9... wrote:
I'm exploring compiling a couple of targets in my Genode, base-hw system to use the hardware fp in the Neon coprocessor on an ARM Cortex A8 SOC, to get some performance improvement. I can add the CC_OPT line containing -mfloat-abi=hard and -mfu=neon-vfpv4 to appropreate target.mk. That should result in the Neon fp unit being called to perform the calculations. If I want use libc math functions I assume I must recompile libc with the same compile options added. Are there build variables that allow me to do that to the libc library?
With 15.02, you may try to patch repos/libports/lib/mk/arm/libm.mk accordingly. There is not way to just "configure" compilation options for libc that I know of.
Regards
Thanks, Christian. I'll modify that make file.
Bob
Sent from my android device.
-----Original Message----- From: Christian Helmuth <christian.helmuth@...1...> To: genode-main@lists.sourceforge.net Sent: Thu, 01 Oct 2015 5:10 AM Subject: Re: Using ARM/Neon fp hardware with libc
Hello Bob,
On Wed, Sep 30, 2015 at 03:14:26PM -0400, robjsstewart@...9... wrote:
I'm exploring compiling a couple of targets in my Genode, base-hw system to use the hardware fp in the Neon coprocessor on an ARM Cortex A8 SOC, to get some performance improvement. I can add the CC_OPT line containing -mfloat-abi=hard and -mfu=neon-vfpv4 to appropreate target.mk. That should result in the Neon fp unit being called to perform the calculations. If I want use libc math functions I assume I must recompile libc with the same compile options added. Are there build variables that allow me to do that to the libc library?
With 15.02, you may try to patch repos/libports/lib/mk/arm/libm.mk accordingly. There is not way to just "configure" compilation options for libc that I know of.
Regards
It's a bit more complicated than just setting the correct compiler options in libm.mk. Compiling the various math functions for hardfp works ok as you would expect. The issue comes when building the .so library. The linker call is set to merge the software fp version of libgcc.a. I'm having trouble following the library build process to determine where I can change to using .../fpu/libgcc.a.
Any hints would be appreciated.
Bob
Sent from my android device.
-----Original Message----- From: Christian Helmuth <christian.helmuth@...1...> To: genode-main@lists.sourceforge.net Sent: Thu, 01 Oct 2015 5:10 AM Subject: Re: Using ARM/Neon fp hardware with libc
Hello Bob,
On Wed, Sep 30, 2015 at 03:14:26PM -0400, robjsstewart@...9... wrote:
I'm exploring compiling a couple of targets in my Genode, base-hw system to use the hardware fp in the Neon coprocessor on an ARM Cortex A8 SOC, to get some performance improvement. I can add the CC_OPT line containing -mfloat-abi=hard and -mfu=neon-vfpv4 to appropreate target.mk. That should result in the Neon fp unit being called to perform the calculations. If I want use libc math functions I assume I must recompile libc with the same compile options added. Are there build variables that allow me to do that to the libc library?
With 15.02, you may try to patch repos/libports/lib/mk/arm/libm.mk accordingly. There is not way to just "configure" compilation options for libc that I know of.
Regards
I figured out the best way to get this work after going through the build base make files. The clean way is to add the compiler flags for hardware fp to the CC_MARCH build variable and that will cause LIBGCC to be defined correctly and pick up the version that uses hardware register-based parameters in the fp calls. Putting the additions to the CC_MARCH variable in the spec file for the base-hw platform being used, allows hardware fp support to be compiled in where needed. The base-linux mk spec file for ARM uses this approach.
Bob
Sent from my android device.
-----Original Message----- From: robjsstewart@...196... To: genode-main@lists.sourceforge.net Sent: Fri, 02 Oct 2015 4:24 PM Subject: Re: Using ARM/Neon fp hardware with libc
It's a bit more complicated than just setting the correct compiler options in libm.mk. Compiling the various math functions for hardfp works ok as you would expect. The issue comes when building the .so library. The linker call is set to merge the software fp version of libgcc.a. I'm having trouble following the library build process to determine where I can change to using .../fpu/libgcc.a.
Any hints would be appreciated.
Bob
Sent from my android device.
-----Original Message----- From: Christian Helmuth <christian.helmuth@...1...> To: genode-main@lists.sourceforge.net Sent: Thu, 01 Oct 2015 5:10 AM Subject: Re: Using ARM/Neon fp hardware with libc
Hello Bob,
On Wed, Sep 30, 2015 at 03:14:26PM -0400, robjsstewart@...9... wrote:
I'm exploring compiling a couple of targets in my Genode, base-hw system to use the hardware fp in the Neon coprocessor on an ARM Cortex A8 SOC, to get some performance improvement. I can add the CC_OPT line containing -mfloat-abi=hard and -mfu=neon-vfpv4 to appropreate target.mk. That should result in the Neon fp unit being called to perform the calculations. If I want use libc math functions I assume I must recompile libc with the same compile options added. Are there build variables that allow me to do that to the libc library?
With 15.02, you may try to patch repos/libports/lib/mk/arm/libm.mk accordingly. There is not way to just "configure" compilation options for libc that I know of.
Regards
Hi Bob,
On 09/30/2015 09:14 PM, robjsstewart@...9... wrote:
Hi all, I'm exploring compiling a couple of targets in my Genode, base-hw system to use the hardware fp in the Neon coprocessor on an ARM Cortex A8 SOC, to get some performance improvement. I can add the CC_OPT line containing -mfloat-abi=hard and -mfu=neon-vfpv4 to appropreate target.mk. That should result in the Neon fp unit being called to perform the calculations. If I want use libc math functions I assume I must recompile libc with the same compile options added. Are there build variables that allow me to do that to the libc library?
I just wanted to add that currently FPU context switching is implemented in base-hw for Cortex A9 only. It might be no big deal to enable it for Cortex A8 by looking at the other implementation. As long as more than one of your components uses the FPU this has to be done first.
Regards Stefan
I'm currently building on Genode 15.02.
Thanks for any info.
Bob Stewart
Sent from my android device.
genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main
Thanks, Stefan, I'll take a look at the context switching implementation for the Cortex A9.
Bob
Sent from my android device.
-----Original Message----- From: Stefan Kalkowski <stefan.kalkowski@...1...> To: genode-main@lists.sourceforge.net Sent: Tue, 06 Oct 2015 2:35 AM Subject: Re: Using ARM/Neon fp hardware with libc
Hi Bob,
On 09/30/2015 09:14 PM, robjsstewart@...9... wrote:
Hi all, I'm exploring compiling a couple of targets in my Genode, base-hw system to use the hardware fp in the Neon coprocessor on an ARM Cortex A8 SOC, to get some performance improvement. I can add the CC_OPT line containing -mfloat-abi=hard and -mfu=neon-vfpv4 to appropreate target.mk. That should result in the Neon fp unit being called to perform the calculations. If I want use libc math functions I assume I must recompile libc with the same compile options added. Are there build variables that allow me to do that to the libc library?
I just wanted to add that currently FPU context switching is implemented in base-hw for Cortex A9 only. It might be no big deal to enable it for Cortex A8 by looking at the other implementation. As long as more than one of your components uses the FPU this has to be done first.
Regards Stefan
I'm currently building on Genode 15.02.
Thanks for any info.
Bob Stewart
Sent from my android device.
genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main
Stefan, So, implementing the Cpu_lazy_state in cpu.h in the core spec for the Cortex A9 into cpu.h for the Cortex A8 spec is all that is required?
Bob
Sent from my android device.
-----Original Message----- From: robjsstewart@...196... To: genode-main@lists.sourceforge.net Sent: Tue, 06 Oct 2015 7:23 AM Subject: Re: Using ARM/Neon fp hardware with libc
Thanks, Stefan, I'll take a look at the context switching implementation for the Cortex A9.
Bob
Sent from my android device.
-----Original Message----- From: Stefan Kalkowski <stefan.kalkowski@...1...> To: genode-main@lists.sourceforge.net Sent: Tue, 06 Oct 2015 2:35 AM Subject: Re: Using ARM/Neon fp hardware with libc
Hi Bob,
On 09/30/2015 09:14 PM, robjsstewart@...9... wrote:
Hi all, I'm exploring compiling a couple of targets in my Genode, base-hw system to use the hardware fp in the Neon coprocessor on an ARM Cortex A8 SOC, to get some performance improvement. I can add the CC_OPT line containing -mfloat-abi=hard and -mfu=neon-vfpv4 to appropreate target.mk. That should result in the Neon fp unit being called to perform the calculations. If I want use libc math functions I assume I must recompile libc with the same compile options added. Are there build variables that allow me to do that to the libc library?
I just wanted to add that currently FPU context switching is implemented in base-hw for Cortex A9 only. It might be no big deal to enable it for Cortex A8 by looking at the other implementation. As long as more than one of your components uses the FPU this has to be done first.
Regards Stefan
I'm currently building on Genode 15.02.
Thanks for any info.
Bob Stewart
Sent from my android device.
genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main
Hi Bob,
On 06.10.2015 14:23, robjsstewart@...9... wrote:> Stefan,
So, implementing the Cpu_lazy_state in cpu.h in the core spec for the Cortex A9 into cpu.h for the Cortex A8 spec is all that is required?
The FPU support for Cortex-A9 in base-hw works as follows:
Initially, the FPU is turned off by the Kernel, regardless of whether an FPU is supported. So, if a thread issues an FPU instruction, it always traps to the Kernel with an 'UNDEFINED_INSTRUCTION' exception first. 'Thread::exception' in [1] then asks the CPU-specific code whether to retry the unknown instruction. On most CPUs, the questioned code does nothing more then returning "no". But on Cortex-A9 CPUs that have the FP/SIMD extensions, the method 'Cpu::retry_undefined_instr' in [2] does more. First, it toggles the FPU on. Then, it checks who was the last user of the FPU. If the last user was another one than the thread that now triggered the 'UNDEFINED_INSTRUCTION', the FPU context has to be switched. The FPU context is part of the so-called "lazy CPU state". This state is part of every thread and is called "lazy" because, in contrast to the common CPU state of a thread (r0-r12,ip,sp,...), it isn't necessarily switched on every thread-switch. At the end of 'Cpu::retry_undefined_instr', the Kernel denotes which is the new user and tells 'Thread::exception' to let the application retry its last instruction. Now that the FPU is on and contains the correct context, the instruction succeeds. However, the context of the thread remains in the FPU. To enable other threads to use the FPU in parallel, the Kernel has to ensure that it gets the opportunity to switch the context again when necessary. Thus, it disables the FPU again as soon as the thread that enabled it, gets scheduled away. The hook for this is the CPU-specific method 'Cpu::prepare_proceeding' [2] that is called on every Kernel pass in 'Cpu::exception' [3].
In a nutshell, the FPU is turned on only on an FPU instruction and turned off again as soon as the thread that called the FPU instruction looses the scheduling focus. The FPU context is switched only on an FPU instruction that is issued by another thread than the one that issued the last FPU instruction.
To enable FPU support for Cortex A8, you would have to implement 'Arm_v7::finish_init_phys_kernel' (initialize but toggle off FPU), 'Genode::Cpu_lazy_state', 'Cpu::retry_undefined_instr', and 'Cpu::prepare_proceeding'. All in [4].
If you have further question, don't hesitate to ask ;)
Cheers, Martin
[1] repos/base-hw/src/core/spec/arm/kernel/thread.cc [2] repos/base-hw/src/core/include/spec/cortex_a9/cpu.h [3] repos/base-hw/src/core/kernel/cpu.cc [4] repos/base-hw/src/core/include/spec/cortex_a8/cpu.h
Thanks Martin. I went through the cpu.h code for the Cortex A9 yesterday and understood what the funtions were doing. With your explanation I now understand how they are used.
I'll run your fpu test script on Friday and check that my implementation for the Cortex A8 is correct.
Bob
Sent from my android device.
-----Original Message----- From: Martin Stein <martin.stein@...1...> To: Genode OS Framework Mailing List genode-main@lists.sourceforge.net Sent: Wed, 07 Oct 2015 4:09 AM Subject: Re: Using ARM/Neon fp hardware with libc
Hi Bob,
On 06.10.2015 14:23, robjsstewart@...9... wrote:> Stefan,
So, implementing the Cpu_lazy_state in cpu.h in the core spec for the Cortex A9 into cpu.h for the Cortex A8 spec is all that is required?
The FPU support for Cortex-A9 in base-hw works as follows:
Initially, the FPU is turned off by the Kernel, regardless of whether an FPU is supported. So, if a thread issues an FPU instruction, it always traps to the Kernel with an 'UNDEFINED_INSTRUCTION' exception first. 'Thread::exception' in [1] then asks the CPU-specific code whether to retry the unknown instruction. On most CPUs, the questioned code does nothing more then returning "no". But on Cortex-A9 CPUs that have the FP/SIMD extensions, the method 'Cpu::retry_undefined_instr' in [2] does more. First, it toggles the FPU on. Then, it checks who was the last user of the FPU. If the last user was another one than the thread that now triggered the 'UNDEFINED_INSTRUCTION', the FPU context has to be switched. The FPU context is part of the so-called "lazy CPU state". This state is part of every thread and is called "lazy" because, in contrast to the common CPU state of a thread (r0-r12,ip,sp,...), it isn't necessarily switched on every thread-switch. At the end of 'Cpu::retry_undefined_instr', the Kernel denotes which is the new user and tells 'Thread::exception' to let the application retry its last instruction. Now that the FPU is on and contains the correct context, the instruction succeeds. However, the context of the thread remains in the FPU. To enable other threads to use the FPU in parallel, the Kernel has to ensure that it gets the opportunity to switch the context again when necessary. Thus, it disables the FPU again as soon as the thread that enabled it, gets scheduled away. The hook for this is the CPU-specific method 'Cpu::prepare_proceeding' [2] that is called on every Kernel pass in 'Cpu::exception' [3].
In a nutshell, the FPU is turned on only on an FPU instruction and turned off again as soon as the thread that called the FPU instruction looses the scheduling focus. The FPU context is switched only on an FPU instruction that is issued by another thread than the one that issued the last FPU instruction.
To enable FPU support for Cortex A8, you would have to implement 'Arm_v7::finish_init_phys_kernel' (initialize but toggle off FPU), 'Genode::Cpu_lazy_state', 'Cpu::retry_undefined_instr', and 'Cpu::prepare_proceeding'. All in [4].
If you have further question, don't hesitate to ask ;)
Cheers, Martin
[1] repos/base-hw/src/core/spec/arm/kernel/thread.cc [2] repos/base-hw/src/core/include/spec/cortex_a9/cpu.h [3] repos/base-hw/src/core/kernel/cpu.cc [4] repos/base-hw/src/core/include/spec/cortex_a8/cpu.h
------------------------------------------------------------------------------ Full-scale, agent-less Infrastructure Monitoring from a single dashboard Integrate with 40+ ManageEngine ITSM Solutions for complete visibility Physical-Virtual-Cloud Infrastructure monitoring from one console Real user monitoring with APM Insights and performance trend reports Learn More http://pubads.g.doubleclick.net/gampad/clk?id=247754911&iu=/4140 _______________________________________________ genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main