Author Topic: What make NASM slower than C ? (Read 25772 times)

ngocanh198 · « **on:** September 05, 2010, 03:38:59 AM »

I write the same loop in C and NASM.
In C:

    kq=0;
    for (i=n-1;i>=2;i--)
        for (j=i-1;j>=1;j--)
            for (k=j-1;k>=0;k--)
                kq+=fabs((x[j]-x[i])*(y[k]-y[i])-(x[k]-x[i])*(y[j]-y[i]));

And in NASM:

Code: [Select]

	xor eax,eax
	mov [kq],eax
	mov [kq+4],eax
	mov [kq+8],ax
	fbld [kq]
	fwait
	mov ax,[n]
	dec ax
	mov [i],ax
FOR_i:;for i=n-1 downto 2
	mov ax,[i]
	dec ax
	mov [j],ax
	mov esi,0
	mov si,[i]
	shl esi,1
	mov ax,[x+esi]
	mov bx,[y+esi]
	mov [xi],ax
	mov [yi],bx
FOR_j:;for j=i-1 downto 1
	mov ax,[j]
	dec ax
	mov [k],ax
	mov esi,0
	mov si,[j]
	shl esi,1
	mov ax,[x+esi]
	mov bx,[y+esi]
	mov [xj],ax
	mov [yj],bx
	FOR_k:
		mov esi,0
		mov si,[k]
		shl esi,1
		mov ax,[x+esi]
		mov bx,[y+esi]
		mov [xk],ax
		mov [yk],bx
		;
		fild word [xj]
		fild word [xi]
		fsub
		fild word [yk]
		fild word [yi]
		fsub
		fmul
		fild word [xk]
		fild word [xi]
		fsub
		fild word [yj]
		fild word [yi]
		fsub
		fmul
		fsub
		fabs
		fadd
		fwait
		;
		cmp word [k],0
		je END_FOR_k
		dec word [k]
		jmp FOR_k
	END_FOR_k:
	cmp word [j],1
	je END_FOR_j
	dec word [j]
	jmp FOR_j
END_FOR_j:
	cmp word [i],2
	je END_FOR_i
	dec word [i]
	jmp FOR_i
END_FOR_i:
	mov word [xi],5
	fild word [xi]
	fmul
	fbstp [kq]
	fwait

With n=1000, time to run in C = 2.69 secs and time to run in NASM = 6.97 secs
What make NASM slower than C, and how to seed up this code ?
Thanks very much

Rob Neff · « **Reply #1 on:** September 05, 2010, 05:49:23 AM »

Not being able to write optimized assembly code is the issue here, not Nasm.
My suggestion is that you enable assembly listing output of your C program in your compiler and compare the routine in question to your implementation in assembly. You will get good hints on where you messed up in your assembly code...

Frank Kotler · « **Reply #2 on:** September 05, 2010, 05:56:04 AM »

What makes Nasm slower than C? You do! Nasm does absolutely nothing to make your code any faster or any slower than what you write.

As to how to make the code faster... "fbild" and "fbstp" are dog slow, is the first thing that comes to mind. Why are you doing it that way? The mixture of 16- and 32-bit code can't help. Unless you're trapped in a medieval dungeon with nothing but a 286, write 32-bit code, for gossake! You appear to be doing more "fild"s than "fstp"s - does this even work? I'm surprised you don't overflow your FPU stack and crash! Is there a reason for the "fwait"s where you've got 'em?

Without the data declarations, it's hard to see just what you're trying to do. Looks like mostly integer arithmetic(?)... I haven't got time/inclination to wrestle with this right now. If you'll post the complete C code - something that will compile - I'll try to get around to... or you can do it yourself... compile it to an executable (mine will be for Linux), and disassemble it with Agner Fog's "objconv" ("-fnasm")...

http://www.agner.org/optimize

You'll have to scroll down a ways to get to your code (there's an amazing amount of "cruft" in a C program, but it doesn't seem to harm it). See what the compiler does that's so much faster. As an alternative, get the compiler to spit out assembly, and look at that. It may be hard to beat. Those compiler authors aren't dumb, and they've been at it for a while. They probably understand the FPU better than you and I do. But maybe we can do still better - no harm trying. Designing a good FPU algorithm isn't easy! FPU is perhaps "obsolete" anyway - maybe should be using SSE or so. I have no clue how that would work. If we even need FPU... looks like a lot of integer math to me...

But post something with the data declarations. I'm not going to guess.

Best,
Frank

P.S. I see Rob just said the same thing, more succinctly.

NASM - The Netwide Assembler

News:

Author Topic: What make NASM slower than C ? (Read 25772 times)

ngocanh198

What make NASM slower than C ?

Rob Neff

Re: What make NASM slower than C ?

Frank Kotler

Re: What make NASM slower than C ?