NASM - The Netwide Assembler
NASM Forum => Programming with NASM => Topic started by: shaynox on December 10, 2014, 10:49:29 AM
-
Hello, i don't know why when i try to access to AVX register (ymm.) my program (3D rendering) slow down, here one exemple in my code:
VBROADCASTF128 ymm2, [rotate_yz_ymm2]
VBROADCASTF128 ymm3, [rotate_xyz_ymm3]
VBROADCASTF128 ymm4, [rotate_yz_ymm4]
VBROADCASTF128 ymm5, [rotate_z_ymm5]
VBROADCASTF128 ymm6, [rotate_y_ymm6]
VBROADCASTF128 ymm7, [coordonee]
And this code fall my fps down by ~23, so i have 4 fps with it and 27 fps with that:
vmovups xmm2, [rotate_yz_ymm2]
vmovups xmm3, [rotate_xyz_ymm3]
vmovups xmm4, [rotate_yz_ymm4]
vmovups xmm5, [rotate_z_ymm5]
vmovups xmm6, [rotate_y_ymm6]
vmovups xmm7, [coordonee]
My CPU is a i7-2640M.
I hope i'm the only have this problem -_-
Thanks.
-
Hi Shaynox,
Okay, I've deleted the double post. Thanks for bringing it to our attention. I don't know the answer to your question, though. Anybody?
Best,
Frank
-
Finally i solved that problem, someone on intel developers zone tell me that:
"Does your code intermixes SSE and AVX instructions in the same code path? If it does then you have SSE-to-AVX transition penalties. In order to solve it use vzeroupper instruction."
vzeroupper don't work, but if i upgrade all my SSE instruction with v- prefix (AVX instruction) it's work perfectly, no more latency.
-
Good! Thanks for sharing the solution with us.
Best,
Frank