NASM - The Netwide Assembler

NASM Forum => Programming with NASM => Topic started by: shaynox on December 10, 2014, 10:49:29 AM

Title: [solved] AVX register slower than SSE register
Post by: shaynox on December 10, 2014, 10:49:29 AM
Hello, i don't know why when i try to access to AVX register (ymm.) my program (3D rendering) slow down, here one exemple in my code:

Code: [Select]
VBROADCASTF128 ymm2, [rotate_yz_ymm2]
VBROADCASTF128 ymm3, [rotate_xyz_ymm3]
VBROADCASTF128 ymm4, [rotate_yz_ymm4]
VBROADCASTF128 ymm5, [rotate_z_ymm5]
VBROADCASTF128 ymm6, [rotate_y_ymm6]
VBROADCASTF128 ymm7, [coordonee]

And this code fall my fps down by ~23, so i have 4 fps with it and 27 fps with that:

Code: [Select]
vmovups xmm2, [rotate_yz_ymm2]
vmovups xmm3, [rotate_xyz_ymm3]
vmovups xmm4, [rotate_yz_ymm4]
vmovups xmm5, [rotate_z_ymm5]
vmovups xmm6, [rotate_y_ymm6]
vmovups xmm7, [coordonee]
My CPU is a i7-2640M.
I hope i'm the only have this problem -_-


Thanks.
Title: Re: AVX register slower than SSE register
Post by: Frank Kotler on December 10, 2014, 11:35:35 AM
Hi Shaynox,

Okay, I've deleted the double post. Thanks for bringing it to our attention. I don't know the answer to your question, though. Anybody?

Best,
Frank

Title: Re: AVX register slower than SSE register
Post by: shaynox on December 10, 2014, 03:47:24 PM
Finally i solved that problem, someone on intel developers zone tell me that:
"Does your code intermixes SSE and AVX instructions in the same code path? If it does then you have SSE-to-AVX transition penalties. In order to solve it use vzeroupper instruction."

vzeroupper don't work, but if i upgrade all my SSE instruction with v- prefix (AVX instruction) it's work perfectly, no more latency.
Title: Re: [solved] AVX register slower than SSE register
Post by: Frank Kotler on December 10, 2014, 04:10:10 PM
Good! Thanks for sharing the solution with us.

Best,
Frank