Author Topic: Win64 SSE Vector2D Speed Test (GCC, NASM) (Read 20443 times)

encryptor256 · « **on:** March 17, 2014, 06:49:08 AM »

Hello,
This is Win64 SSE Vector2D Speed Test (Assembly, C, C Assembly, C Assembly Macros).

Notes:
Test is performed in MINGW64 -> GCC -> C environment.
Test is based on twelve vector functions.
There are four sets of these kind'a functions.
Each set is defined in a different way.
Test is time-based-test, in miliseconds.

Vector2D base functions:

Code: [Select]

typedef struct tagVector2D
{
	double X;
	double Y;
}Vector2D;

extern char Vector2DAddD(Vector2D*,double,double);
extern char Vector2DSubD(Vector2D*,double,double);
extern char Vector2DMulD(Vector2D*,double,double);
extern char Vector2DDivD(Vector2D*,double,double);
extern char Vector2DAddV(Vector2D*,Vector2D*);
extern char Vector2DSubV(Vector2D*,Vector2D*);
extern char Vector2DMulV(Vector2D*,Vector2D*);
extern char Vector2DDivV(Vector2D*,Vector2D*);
extern char Vector2DMagnitude(Vector2D*,double*);
extern char Vector2DNormalize(Vector2D*);
extern char Vector2DDotProduct(Vector2D*,Vector2D*,double*);
extern char Vector2DNormal(Vector2D*);

First set (No prefix):
Functions are created in NASM, created obj file, then attached at linktime to test project.
Located in: "vector2d.asm".
Compile: "nasm.exe -f win64 -o vector2d.obj vector2d.asm".
Notify: Code is brand new and used as a base code for other function code set's.
Preview of first function: Vector2DAddD.

Code: [Select]

align 16
Vector2DAddD:
	test rcx,rcx
	setz al
	jnz .proceed
	ret
.proceed:
	movapd xmm0,[rcx]
	movlhps xmm1,xmm2
	addpd xmm0,xmm1
	movapd [rcx],xmm0
	ret

Second set (Prefix Z):
Defined as C Style functions.
Located in: "ZFunctions.c".
Notify: Code is nearly based on first set.
Preview of first function: ZVector2DAddD.

Code: [Select]

char ZVector2DAddD(Vector2D * v0,double pX,double pY)
{
	if(v0==NULL) return 1;
	v0->X+=pX;
	v0->Y+=pY;
	return 0;
};

Thrid set (Prefix P):
Defined as C Assembly functions.
Located in: "main.c".
Notify: Code is based on first set.
Preview of first function: PVector2DAddD.

Code: [Select]

asm("							\n\
PVector2DAddD:					\n\
	test %rcx,%rcx				\n\
	setz %al					\n\
	jnz .proceed0				\n\
	ret							\n\
.proceed0:						\n\
	movapd 0x0(%rcx),%xmm0		\n\
	movlhps %xmm2,%xmm1			\n\
	addpd %xmm1,%xmm0			\n\
	movapd %xmm0,0x0(%rcx)		\n\
	ret							\n\
");

Fourth set (Prefix M):
Defined as C Assembly Macro, more like direct inline code.
Located in: "main.c".
Notify: Code is based on third set.
Preview of first function: MVector2DAddD.

Code: [Select]

#define MVector2DAddV(v0,v1) 			asm("movapd %1,%%xmm0; movapd %2,%%xmm1; addpd %%xmm1,%%xmm0; movapd %%xmm0,%0;" :"=m" (v0) :"m" (v0), "m" (v1) :"%xmm0", "%xmm1");

Each function set has it's own test loop function.
Located in: "main.c".

Code: [Select]

void noprefixLoop(void);
void ZprefixLoop(void);
void PprefixLoop(void);
void MprefixLoop(void);

Base test loop (No prefix):

Code: [Select]

void noprefixLoop(void)
{
	unsigned long counter=0xfffffff;
	clock_t timeStart;
	double vardouble;
	Vector2D v0,v1;
	clock_t time;
	char result;

	v1 = (Vector2D){2000.0,3000.00};	
	v0 = (Vector2D){100000.00,150000.0};

	timeStart=clock();

	printf("\r\n TEST: noprefixLoop (vector2d.obj)",result,v0.X,v0.Y);

	while(counter>0)
	{
		//
		Vector2DAddD(&v0,v1.X,v1.Y);	
		Vector2DSubD(&v0,v1.X,v1.Y);
		Vector2DMulD(&v0,v1.X,v1.Y);	
		Vector2DDivD(&v0,v1.X,v1.Y);
		//
		Vector2DAddV(&v0,&v1);
		Vector2DSubV(&v0,&v1);
		Vector2DMulV(&v0,&v1);
		Vector2DDivV(&v0,&v1);
		//
		v0 = (Vector2D){10.0,15.0};
		Vector2DMagnitude(&v0,&vardouble);
		v0 = (Vector2D){10.0,15.0};
		Vector2DNormalize(&v0);
		v0 = (Vector2D){10.0,15.0};
		v1 = (Vector2D){2.0,3.0};
		Vector2DDotProduct(&v0,&v1,&vardouble);
		v0 = (Vector2D){10.0,15.0};
		Vector2DNormal(&v0);
		//
		counter=counter-1;
	};

	time = clock() - timeStart;
	printf("\r\nTime spent: %d",time);
};

Test Project files:

vector2d.asm - obj
main.h
ZFunctions.c
main.c

Compile:
"gcc main.c vector2d.obj ZFunctions.c -Ofast"

!!! Test results !!!:

Code: [Select]


 TEST: noprefixLoop (vector2d.obj)
Time spent: 48467
 TEST: ZprefixLoop (C Style functions)
Time spent: 37187
 TEST: PprefixLoop (Defined as Plain Assembly Functions)
Time spent: 48342
 TEST: MprefixLoop (Defined as Inline C Assembly Macros)
Time spent: 3656
END

After calling twelve vector functions, 0xfffffff times:
Fastest are "M prefix functions" defined as C Assembly Macros, with 3656 miliseconds,
because code is almost raw and inline.
This is more like environment issue.

Added attachment.

I tested on:

Code: [Select]

Processors Information
-------------------------------------------------------------------------

Processor 1			ID = 0
	Number of cores		2 (max 2)
	Number of threads	2 (max 2)
	Name			Intel Core 2 Duo E6400
	Codename		Conroe
	Specification		Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz
	Package (platform ID)	Socket 775 LGA (0x0)
	CPUID			6.F.6
	Extended CPUID		6.F
	Core Stepping		B2
	Technology		65 nm
	Core Speed		2133.1 MHz
	Multiplier x Bus Speed	8.0 x 266.6 MHz
	Rated Bus speed		1066.6 MHz
	Stock frequency		2133 MHz
	Instructions sets	MMX, SSE, SSE2, SSE3, SSSE3, EM64T, VT-x
	L1 Data cache		2 x 32 KBytes, 8-way set associative, 64-byte line size
	L1 Instruction cache	2 x 32 KBytes, 8-way set associative, 64-byte line size
	L2 cache		2048 KBytes, 8-way set associative, 64-byte line size
	FID/VID Control		yes
	FID range		6.0x - 8.0x
	Max VID			1.325 V

After doing this test and getting results,
i don't want to deal with functions anymore.

This test revealed, that inline code is way better than calling functions, especially if the time is important and not the code size.

But GCC doesn't stand a chance against NASM.
In NASM we can do - raw brain power.

Bye!

NASM - The Netwide Assembler

News:

Author Topic: Win64 SSE Vector2D Speed Test (GCC, NASM) (Read 20443 times)

encryptor256

Win64 SSE Vector2D Speed Test (GCC, NASM)