Hey guys!
I have code for SSE, where the code computes matrix multiplication using Single-Preciscion Floats. I need the code for AVX, where I will work with Double-Precision floats. I know that general-purpose registers will have to "move" twice as much and so on.
Here is the SSE code:
vxm_sse1:
push dword ebp
mov dword ebp,esp
sub esp,4 ;reserve for one variable 4B
mov dword [ebp - 4],3 ;set the value of the variable
mov eax,16
mul dword [ebp + 24]
push eax
mul dword [ebp - 4]
mov dword [ebp - 4],eax
pop eax
mov esi,[ebp + 12] ;input vector pointer
mov ecx,[ebp + 20] ;length of the input vector
mov edx,[ebp + 8] ;matrix pointer
.invec1: ;start of the loop1
movups xmm0,[esi]
mov edi,[ebp + 16] ;output vector pointer
mov ebx,[ebp + 24] ;length of the output vector
.radmat1: ;start of the loop2
movups xmm4,[edx]
add edx,eax
movups xmm5,[edx]
add edx,eax
movups xmm6,[edx]
add edx,eax
movups xmm7,[edx]
;COMPUTATION CODE HERE, not important for the issue
add edi,16
sub edx,dword [ebp - 4]
add edx,16
dec ebx
jnz .radmat1
add esi,16
add edx,dword [ebp - 4]
dec ecx
jnz .invec1
mov dword esp,ebp
pop dword ebp
ret 20
_________________________________________________________________________
Well, I will probably have to change all the offsets of the matrix, input/output vector, their length. I tried some codes with doubled offsets of EBP but is still does not work.
Could you please help me? Thank you!