Author Topic: [solved] AVX register slower than SSE register  (Read 7251 times)

Offline shaynox

  • Full Member
  • **
  • Posts: 118
  • Country: gr
[solved] AVX register slower than SSE register
« on: December 10, 2014, 10:49:29 AM »
Hello, i don't know why when i try to access to AVX register (ymm.) my program (3D rendering) slow down, here one exemple in my code:

Code: [Select]
VBROADCASTF128 ymm2, [rotate_yz_ymm2]
VBROADCASTF128 ymm3, [rotate_xyz_ymm3]
VBROADCASTF128 ymm4, [rotate_yz_ymm4]
VBROADCASTF128 ymm5, [rotate_z_ymm5]
VBROADCASTF128 ymm6, [rotate_y_ymm6]
VBROADCASTF128 ymm7, [coordonee]

And this code fall my fps down by ~23, so i have 4 fps with it and 27 fps with that:

Code: [Select]
vmovups xmm2, [rotate_yz_ymm2]
vmovups xmm3, [rotate_xyz_ymm3]
vmovups xmm4, [rotate_yz_ymm4]
vmovups xmm5, [rotate_z_ymm5]
vmovups xmm6, [rotate_y_ymm6]
vmovups xmm7, [coordonee]
My CPU is a i7-2640M.
I hope i'm the only have this problem -_-


Thanks.
« Last Edit: December 10, 2014, 03:55:08 PM by shaynox »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: AVX register slower than SSE register
« Reply #1 on: December 10, 2014, 11:35:35 AM »
Hi Shaynox,

Okay, I've deleted the double post. Thanks for bringing it to our attention. I don't know the answer to your question, though. Anybody?

Best,
Frank


Offline shaynox

  • Full Member
  • **
  • Posts: 118
  • Country: gr
Re: AVX register slower than SSE register
« Reply #2 on: December 10, 2014, 03:47:24 PM »
Finally i solved that problem, someone on intel developers zone tell me that:
"Does your code intermixes SSE and AVX instructions in the same code path? If it does then you have SSE-to-AVX transition penalties. In order to solve it use vzeroupper instruction."

vzeroupper don't work, but if i upgrade all my SSE instruction with v- prefix (AVX instruction) it's work perfectly, no more latency.

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: [solved] AVX register slower than SSE register
« Reply #3 on: December 10, 2014, 04:10:10 PM »
Good! Thanks for sharing the solution with us.

Best,
Frank