Thanks for the replies, it was helpful!
I wasn't using the optimization flag -O3 on both the code running in the NW and SW. Now I am and the times are pretty similar between the NW execution and the SW execution on the example I was testing.
Now I'm testing another example and I'm getting some interesting results. The code above represents an image transformation. I'm going through every position in an array of integers and changing que new array values with a slight modification from the old values:
// start timer here
for(i = 0; i < size; i++) {
color = oldp[i];
alpha = (color >> 24) & 0xff;
red = (color >> 16) & 0xff;
green = (color >> 8) & 0xff;
blue = color & 0xff;
lum = (int) (red * 0.299 + green * 0.587 + blue * 0.114);
newp[i] = (alpha << 24) | (lum << 16) | (lum << 8) | lum;
}
// end timer here
// check timer diff and print result
I'm testing this same exact code on both the Secure and Nonsecure domains.
In the NW I'm getting about 155 ms of execution time, which for that buffer and transformation seems ok. On the other hand, the SW is giving me about 610 ms of execution time.
I can't seem to find a reasonable explanation for this time difference, since the code running in both scenarios is exactly the same. The secure code is running inside the TZ_VMM example.
Do you have an ideia on what might be happening here?
Thanks in advance, Tiago