NASM - The Netwide Assembler

NASM Forum => Programming with NASM => Topic started by: shaynox on December 10, 2014, 10:49:29 AM

Title: [solved] AVX register slower than SSE register
Post by: shaynox on December 10, 2014, 10:49:29 AM: Hello, i don't know why when i try to access to AVX register (ymm.) my program (3D rendering) slow down, here one exemple in my code:

Code: [Select]
VBROADCASTF128 ymm2, [rotate_yz_ymm2] VBROADCASTF128 ymm3, [rotate_xyz_ymm3] VBROADCASTF128 ymm4, [rotate_yz_ymm4] VBROADCASTF128 ymm5, [rotate_z_ymm5] VBROADCASTF128 ymm6, [rotate_y_ymm6] VBROADCASTF128 ymm7, [coordonee]
And this code fall my fps down by ~23, so i have 4 fps with it and 27 fps with that:

Code: [Select]
vmovups xmm2, [rotate_yz_ymm2] vmovups xmm3, [rotate_xyz_ymm3] vmovups xmm4, [rotate_yz_ymm4] vmovups xmm5, [rotate_z_ymm5] vmovups xmm6, [rotate_y_ymm6] vmovups xmm7, [coordonee]My CPU is a i7-2640M.
I hope i'm the only have this problem -_-

Thanks.
Title: Re: AVX register slower than SSE register
Post by: Frank Kotler on December 10, 2014, 11:35:35 AM: Hi Shaynox,

Okay, I've deleted the double post. Thanks for bringing it to our attention. I don't know the answer to your question, though. Anybody?

Best,
Frank
Title: Re: AVX register slower than SSE register
Post by: shaynox on December 10, 2014, 03:47:24 PM: Finally i solved that problem, someone on intel developers zone tell me that:
"Does your code intermixes SSE and AVX instructions in the same code path? If it does then you have SSE-to-AVX transition penalties. In order to solve it use vzeroupper instruction."

vzeroupper don't work, but if i upgrade all my SSE instruction with v- prefix (AVX instruction) it's work perfectly, no more latency.
Title: Re: [solved] AVX register slower than SSE register
Post by: Frank Kotler on December 10, 2014, 04:10:10 PM: Good! Thanks for sharing the solution with us.

Best,
Frank