Author Topic: My first attempt using nasm and adding two vectors!  (Read 7259 times)

Offline manler

  • New Member
  • Posts: 1
My first attempt using nasm and adding two vectors!
« on: December 23, 2012, 09:00:33 PM »
Hello everybody!

This is my first real attempt at using nasm to create a function that I link into a c++ program.
The function adds to vectors of integers and returns the result. So it isn't the worlds most useful function but I wrote it to get the hang of nasm. It uses SSE3 commands.

Am I doing the right things here? I mean things like:
register use?
how to setup & tear down a function? enter/leave?
reading args passed to the function?

Please give me some feedback.
My main goal with using nasm is to create functions that link into c++ programs.


In C++ the function looks like this:
Code: [Select]
int addvectors_c(int *vec1, int *vec2, int elements)
{

int sum = 0;
for(int i = 0; i<elements; ++i)
{
sum += vec1[i] + vec2[i];
}
return sum;
}

In Nasm the function looks like this:

Code: [Select]
segment .data
segment .bss
segment .text
global _addvectors
;
; int addvectors(int *vec1, int *vec2, int elements);
;
_addvectors:
enter 0,0
pusha
mov esi, [ebp+8] ; first parameter vec1
mov ecx, [ebp+12] ; second parameter vec2
mov edx, [ebp+16] ; third parameter elements
shl edx, 2 ; convert to number of bytes since integer is 4 bytes.
add edx, esi ; calculate end of vec1
mov edi, esi
sub edi, 16               ; subtract 16 (one lddqu read) to find last index before we start reading 4 bytes at a time
pxor xmm0, xmm0
pxor xmm7, xmm7

; while(esi <= edi)
.more_bytes:
cmp edi, esi
jl .remaining_bytes
lddqu xmm1, [esi]                 ; Using unaligned load (also tried movdqu)
lddqu xmm2, [ecx]
paddd xmm0, xmm1 ; Add to xmm0
paddd xmm0, xmm2 ; Add to xmm0
add esi, 16
add ecx, 16
jmp .more_bytes

.remaining_bytes:
cmp edx, esi
jle .calc_sum
movd xmm6, [esi]
movd xmm7, [ecx]
paddd xmm0, xmm6
paddd xmm0, xmm7
add esi, 4
jmp .remaining_bytes

.calc_sum:
popa
phaddd xmm0, xmm7 ; Do horizontal add of xmm0
phaddd xmm0, xmm7 ; Horizontal add finished
pextrd eax, xmm0, 0         ; Extract sum and put into eax for return value
leave
ret

Just as a side note, the performance of the two functions seems to be faster in asm when the vectors are larger than 1024*800 bytes. Otherwise it is faster in C++. Especially when calling the functions many times with small vectors, then C++ is much faster. I'm guessing that msvc2012 inlines the code and removes the actual function call?

Thank you for any tips and help!

/Mathias
« Last Edit: December 23, 2012, 09:02:14 PM by manler »

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: My first attempt using nasm and adding two vectors!
« Reply #1 on: December 24, 2012, 05:00:23 PM »
I would eliminate the enter/leave and the pusha/popa instructions.  As per the C calling convention ( which is used by C++ ) the only registers used by your function which must be non-volatile ( appear unchanged to the calling function ) are ESI and EDI.  Thus the framework for function entry and exit could be as follows:
Code: [Select]
_addvectors:
    push  edi
    push  esi
    mov esi, [ebp+12] ; first parameter vec1
    mov ecx, [ebp+16] ; second parameter vec2
    mov edx, [ebp+20] ; third parameter elements
    .
    .  ; implementation
    .
    pextrd eax, xmm0, 0         ; Extract sum and put into eax for return value
    pop   esi
    pop   edi
    ret

In the .more_bytes: loop this doesn't look right to me:
Code: [Select]
    add esi, 16
    add ecx, 16

Are you sure you don't mean:
Code: [Select]
    add esi, 4    ; point to next vec1 int
    add ecx, 4   ; point to next vec2 int

Have you verified that your vector addition function is arithmetically correct?


« Last Edit: December 24, 2012, 05:07:43 PM by Rob Neff »

Offline Mathi

  • Jr. Member
  • *
  • Posts: 82
  • Country: in
    • Win32NASM
Re: My first attempt using nasm and adding two vectors!
« Reply #2 on: December 25, 2012, 03:59:13 AM »
Regarding the validity,
The initialization ,
Code: [Select]
add edx, esi ; calculate end of vec1
mov edi, esi
sub edi, 16               ; subtract 16 (one lddqu read) to find last index before we start reading 4 bytes at a time

should have been

add      edx, esi      ; calculate end of vec1
mov      edi, edx
sub      edi, 16               ; subtract 16 (one lddqu read) to find last index before we start reading 4 bytes at a time

And we need to increment ecx also in .remaining_bytes loop.

paddd   xmm0, xmm6
paddd   xmm0, xmm7
add      esi, 4
add       ecx, 4


Rob,
Quote
In the .more_bytes: loop this doesn't look right to me:
Code: [Select]

    add      esi, 16
    add      ecx, 16

I think it is correct . The idea here is to add 4 integers at a time using paddd.

Obviously the c compiler didn't use xmm registers..
But i guess there is still scope for improving performance in the asm routine.
All the Best!

Regards,
Mathi.

Offline Mathi

  • Jr. Member
  • *
  • Posts: 82
  • Country: in
    • Win32NASM
Re: My first attempt using nasm and adding two vectors!
« Reply #3 on: December 25, 2012, 05:12:10 AM »
Quote
I'm guessing that msvc2012 inlines the code and removes the actual function call?

I guess the vc compiler won't inline unless we ask it to.
I used vs2005 though.

BTW i was able to view only the MM0 - MM7 registers . i was not able to view xmm0 to xmm7 . Any idea anyone?

Thanks,
Mathi.