NASM - The Netwide Assembler

NASM Forum => Example Code => Topic started by: encryptor256 on March 17, 2014, 06:49:08 AM

Title: Win64 SSE Vector2D Speed Test (GCC, NASM)
Post by: encryptor256 on March 17, 2014, 06:49:08 AM

Hello,
This is Win64 SSE Vector2D Speed Test (Assembly, C, C Assembly, C Assembly Macros).

Notes:
Test is performed in MINGW64 -> GCC -> C environment.
Test is based on twelve vector functions.
There are four sets of these kind'a functions.
Each set is defined in a different way.
Test is time-based-test, in miliseconds.

Vector2D base functions:

Code: [Select]

typedef struct tagVector2D
{
	double X;
	double Y;
}Vector2D;

extern char Vector2DAddD(Vector2D*,double,double);
extern char Vector2DSubD(Vector2D*,double,double);
extern char Vector2DMulD(Vector2D*,double,double);
extern char Vector2DDivD(Vector2D*,double,double);
extern char Vector2DAddV(Vector2D*,Vector2D*);
extern char Vector2DSubV(Vector2D*,Vector2D*);
extern char Vector2DMulV(Vector2D*,Vector2D*);
extern char Vector2DDivV(Vector2D*,Vector2D*);
extern char Vector2DMagnitude(Vector2D*,double*);
extern char Vector2DNormalize(Vector2D*);
extern char Vector2DDotProduct(Vector2D*,Vector2D*,double*);
extern char Vector2DNormal(Vector2D*);

First set (No prefix):
Functions are created in NASM, created obj file, then attached at linktime to test project.
Located in: "vector2d.asm".
Compile: "nasm.exe -f win64 -o vector2d.obj vector2d.asm".
Notify: Code is brand new and used as a base code for other function code set's.
Preview of first function: Vector2DAddD.

Code: [Select]

align 16
Vector2DAddD:
	test rcx,rcx
	setz al
	jnz .proceed
	ret
.proceed:
	movapd xmm0,[rcx]
	movlhps xmm1,xmm2
	addpd xmm0,xmm1
	movapd [rcx],xmm0
	ret

Second set (Prefix Z):
Defined as C Style functions.
Located in: "ZFunctions.c".
Notify: Code is nearly based on first set.
Preview of first function: ZVector2DAddD.

Code: [Select]

char ZVector2DAddD(Vector2D * v0,double pX,double pY)
{
	if(v0==NULL) return 1;
	v0->X+=pX;
	v0->Y+=pY;
	return 0;
};

Thrid set (Prefix P):
Defined as C Assembly functions.
Located in: "main.c".
Notify: Code is based on first set.
Preview of first function: PVector2DAddD.

Code: [Select]

asm("							\n\
PVector2DAddD:					\n\
	test %rcx,%rcx				\n\
	setz %al					\n\
	jnz .proceed0				\n\
	ret							\n\
.proceed0:						\n\
	movapd 0x0(%rcx),%xmm0		\n\
	movlhps %xmm2,%xmm1			\n\
	addpd %xmm1,%xmm0			\n\
	movapd %xmm0,0x0(%rcx)		\n\
	ret							\n\
");

Fourth set (Prefix M):
Defined as C Assembly Macro, more like direct inline code.
Located in: "main.c".
Notify: Code is based on third set.
Preview of first function: MVector2DAddD.

Code: [Select]

#define MVector2DAddV(v0,v1) 			asm("movapd %1,%%xmm0; movapd %2,%%xmm1; addpd %%xmm1,%%xmm0; movapd %%xmm0,%0;" :"=m" (v0) :"m" (v0), "m" (v1) :"%xmm0", "%xmm1");

Each function set has it's own test loop function.
Located in: "main.c".

Code: [Select]

void noprefixLoop(void);
void ZprefixLoop(void);
void PprefixLoop(void);
void MprefixLoop(void);

Base test loop (No prefix):

Code: [Select]

void noprefixLoop(void)
{
	unsigned long counter=0xfffffff;
	clock_t timeStart;
	double vardouble;
	Vector2D v0,v1;
	clock_t time;
	char result;

	v1 = (Vector2D){2000.0,3000.00};	
	v0 = (Vector2D){100000.00,150000.0};

	timeStart=clock();

	printf("\r\n TEST: noprefixLoop (vector2d.obj)",result,v0.X,v0.Y);

	while(counter>0)
	{
		//
		Vector2DAddD(&v0,v1.X,v1.Y);	
		Vector2DSubD(&v0,v1.X,v1.Y);
		Vector2DMulD(&v0,v1.X,v1.Y);	
		Vector2DDivD(&v0,v1.X,v1.Y);
		//
		Vector2DAddV(&v0,&v1);
		Vector2DSubV(&v0,&v1);
		Vector2DMulV(&v0,&v1);
		Vector2DDivV(&v0,&v1);
		//
		v0 = (Vector2D){10.0,15.0};
		Vector2DMagnitude(&v0,&vardouble);
		v0 = (Vector2D){10.0,15.0};
		Vector2DNormalize(&v0);
		v0 = (Vector2D){10.0,15.0};
		v1 = (Vector2D){2.0,3.0};
		Vector2DDotProduct(&v0,&v1,&vardouble);
		v0 = (Vector2D){10.0,15.0};
		Vector2DNormal(&v0);
		//
		counter=counter-1;
	};

	time = clock() - timeStart;
	printf("\r\nTime spent: %d",time);
};

Test Project files:

vector2d.asm - obj
main.h
ZFunctions.c
main.c

Compile:
"gcc main.c vector2d.obj ZFunctions.c -Ofast"

!!! Test results !!!:

Code: [Select]


 TEST: noprefixLoop (vector2d.obj)
Time spent: 48467
 TEST: ZprefixLoop (C Style functions)
Time spent: 37187
 TEST: PprefixLoop (Defined as Plain Assembly Functions)
Time spent: 48342
 TEST: MprefixLoop (Defined as Inline C Assembly Macros)
Time spent: 3656
END

After calling twelve vector functions, 0xfffffff times:
Fastest are "M prefix functions" defined as C Assembly Macros, with 3656 miliseconds,
because code is almost raw and inline.
This is more like environment issue.

Added attachment.

I tested on:

Code: [Select]

Processors Information
-------------------------------------------------------------------------

Processor 1			ID = 0
	Number of cores		2 (max 2)
	Number of threads	2 (max 2)
	Name			Intel Core 2 Duo E6400
	Codename		Conroe
	Specification		Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz
	Package (platform ID)	Socket 775 LGA (0x0)
	CPUID			6.F.6
	Extended CPUID		6.F
	Core Stepping		B2
	Technology		65 nm
	Core Speed		2133.1 MHz
	Multiplier x Bus Speed	8.0 x 266.6 MHz
	Rated Bus speed		1066.6 MHz
	Stock frequency		2133 MHz
	Instructions sets	MMX, SSE, SSE2, SSE3, SSSE3, EM64T, VT-x
	L1 Data cache		2 x 32 KBytes, 8-way set associative, 64-byte line size
	L1 Instruction cache	2 x 32 KBytes, 8-way set associative, 64-byte line size
	L2 cache		2048 KBytes, 8-way set associative, 64-byte line size
	FID/VID Control		yes
	FID range		6.0x - 8.0x
	Max VID			1.325 V

After doing this test and getting results,
i don't want to deal with functions anymore. :D
This test revealed, that inline code is way better than calling functions, especially if the time is important and not the code size.

But GCC doesn't stand a chance against NASM.
In NASM we can do - raw brain power. :D

Bye!