Hello Tiago,
On Thu, Jun 23, 2016 at 04:32:18PM +0100, Tiago Brito wrote:
I did not check if the binary code is similar, but I did measure just the for-loop in both worlds and the times are those I described previously.
You really should compare the binary code as example that's slower in SW uses floating arithmetics unless I'm mistaken. If the code is similar and the execution time differs much, there may be an issue with FPU handling in SW.
Greets