NASM - The Netwide Assembler

NASM Forum => Programming with NASM => Topic started by: manler on December 23, 2012, 09:00:33 PM

Title: My first attempt using nasm and adding two vectors!
Post by: manler on December 23, 2012, 09:00:33 PM: Hello everybody!

This is my first real attempt at using nasm to create a function that I link into a c++ program.
The function adds to vectors of integers and returns the result. So it isn't the worlds most useful function but I wrote it to get the hang of nasm. It uses SSE3 commands.

Am I doing the right things here? I mean things like:
register use?
how to setup & tear down a function? enter/leave?
reading args passed to the function?

Please give me some feedback.
My main goal with using nasm is to create functions that link into c++ programs.

In C++ the function looks like this:
Code: [Select]
int addvectors_c(int *vec1, int *vec2, int elements) { int sum = 0; for(int i = 0; i<elements; ++i) { sum += vec1[i] + vec2[i]; } return sum; }
In Nasm the function looks like this:

Code: [Select]
segment .data segment .bss segment .text global _addvectors ; ; int addvectors(int *vec1, int *vec2, int elements); ; _addvectors: enter 0,0 pusha mov esi, [ebp+8] ; first parameter vec1 mov ecx, [ebp+12] ; second parameter vec2 mov edx, [ebp+16] ; third parameter elements shl edx, 2 ; convert to number of bytes since integer is 4 bytes. add edx, esi ; calculate end of vec1 mov edi, esi sub edi, 16 ; subtract 16 (one lddqu read) to find last index before we start reading 4 bytes at a time pxor xmm0, xmm0 pxor xmm7, xmm7 ; while(esi <= edi) .more_bytes: cmp edi, esi jl .remaining_bytes lddqu xmm1, [esi] ; Using unaligned load (also tried movdqu) lddqu xmm2, [ecx] paddd xmm0, xmm1 ; Add to xmm0 paddd xmm0, xmm2 ; Add to xmm0 add esi, 16 add ecx, 16 jmp .more_bytes .remaining_bytes: cmp edx, esi jle .calc_sum movd xmm6, [esi] movd xmm7, [ecx] paddd xmm0, xmm6 paddd xmm0, xmm7 add esi, 4 jmp .remaining_bytes .calc_sum: popa phaddd xmm0, xmm7 ; Do horizontal add of xmm0 phaddd xmm0, xmm7 ; Horizontal add finished pextrd eax, xmm0, 0 ; Extract sum and put into eax for return value leave ret
Just as a side note, the performance of the two functions seems to be faster in asm when the vectors are larger than 1024*800 bytes. Otherwise it is faster in C++. Especially when calling the functions many times with small vectors, then C++ is much faster. I'm guessing that msvc2012 inlines the code and removes the actual function call?

Thank you for any tips and help!

/Mathias
Title: Re: My first attempt using nasm and adding two vectors!
Post by: Rob Neff on December 24, 2012, 05:00:23 PM: I would eliminate the enter/leave and the pusha/popa instructions. As per the C calling convention ( which is used by C++ ) the only registers used by your function which must be non-volatile ( appear unchanged to the calling function ) are ESI and EDI. Thus the framework for function entry and exit could be as follows:
Code: [Select]
_addvectors: push edi push esi mov esi, [ebp+12] ; first parameter vec1 mov ecx, [ebp+16] ; second parameter vec2 mov edx, [ebp+20] ; third parameter elements . . ; implementation . pextrd eax, xmm0, 0 ; Extract sum and put into eax for return value pop esi pop edi ret
In the .more_bytes: loop this doesn't look right to me:
Code: [Select]
add esi, 16 add ecx, 16
Are you sure you don't mean:
Code: [Select]
add esi, 4 ; point to next vec1 int add ecx, 4 ; point to next vec2 int
Have you verified that your vector addition function is arithmetically correct?
Title: Re: My first attempt using nasm and adding two vectors!
Post by: Mathi on December 25, 2012, 03:59:13 AM: Regarding the validity,
The initialization ,
Code: [Select]
add edx, esi ; calculate end of vec1 mov edi, esi sub edi, 16 ; subtract 16 (one lddqu read) to find last index before we start reading 4 bytes at a time
should have been

add      edx, esi      ; calculate end of vec1
mov      edi, edx
sub      edi, 16 ; subtract 16 (one lddqu read) to find last index before we start reading 4 bytes at a time

And we need to increment ecx also in .remaining_bytes loop.

paddd   xmm0, xmm6
paddd   xmm0, xmm7
add      esi, 4
add       ecx, 4

Rob,
Quote
In the .more_bytes: loop this doesn't look right to me:
Code: [Select]

add      esi, 16
add      ecx, 16

I think it is correct . The idea here is to add 4 integers at a time using paddd.

Obviously the c compiler didn't use xmm registers..
But i guess there is still scope for improving performance in the asm routine.
All the Best!

Regards,
Mathi.
Title: Re: My first attempt using nasm and adding two vectors!
Post by: Mathi on December 25, 2012, 05:12:10 AM: Quote
I'm guessing that msvc2012 inlines the code and removes the actual function call?

I guess the vc compiler won't inline unless we ask it to.
I used vs2005 though.

BTW i was able to view only the MM0 - MM7 registers . i was not able to view xmm0 to xmm7 . Any idea anyone?

Thanks,
Mathi.