Author Topic: What make NASM slower than C ?  (Read 9811 times)

Offline ngocanh198

  • New Member
  • Posts: 1
What make NASM slower than C ?
« on: September 05, 2010, 03:38:59 AM »
I write the same loop in C and NASM.
In C:
Code: [Select]
    kq=0;
    for (i=n-1;i>=2;i--)
        for (j=i-1;j>=1;j--)
            for (k=j-1;k>=0;k--)
                kq+=fabs((x[j]-x[i])*(y[k]-y[i])-(x[k]-x[i])*(y[j]-y[i]));
And in NASM:
Code: [Select]
xor eax,eax
mov [kq],eax
mov [kq+4],eax
mov [kq+8],ax
fbld [kq]
fwait
mov ax,[n]
dec ax
mov [i],ax
FOR_i:;for i=n-1 downto 2
mov ax,[i]
dec ax
mov [j],ax
mov esi,0
mov si,[i]
shl esi,1
mov ax,[x+esi]
mov bx,[y+esi]
mov [xi],ax
mov [yi],bx
FOR_j:;for j=i-1 downto 1
mov ax,[j]
dec ax
mov [k],ax
mov esi,0
mov si,[j]
shl esi,1
mov ax,[x+esi]
mov bx,[y+esi]
mov [xj],ax
mov [yj],bx
FOR_k:
mov esi,0
mov si,[k]
shl esi,1
mov ax,[x+esi]
mov bx,[y+esi]
mov [xk],ax
mov [yk],bx
;
fild word [xj]
fild word [xi]
fsub
fild word [yk]
fild word [yi]
fsub
fmul
fild word [xk]
fild word [xi]
fsub
fild word [yj]
fild word [yi]
fsub
fmul
fsub
fabs
fadd
fwait
;
cmp word [k],0
je END_FOR_k
dec word [k]
jmp FOR_k
END_FOR_k:
cmp word [j],1
je END_FOR_j
dec word [j]
jmp FOR_j
END_FOR_j:
cmp word [i],2
je END_FOR_i
dec word [i]
jmp FOR_i
END_FOR_i:
mov word [xi],5
fild word [xi]
fmul
fbstp [kq]
fwait
With n=1000, time to run in C = 2.69 secs and time to run in NASM = 6.97 secs
What make NASM slower than C, and how to seed up this code ?
Thanks very much   :)

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: What make NASM slower than C ?
« Reply #1 on: September 05, 2010, 05:49:23 AM »
Not being able to write optimized assembly code is the issue here, not Nasm.
My suggestion is that you enable assembly listing output of your C program in your compiler and compare the routine in question to your implementation in assembly.  You will get good hints on where you messed up in your assembly code...

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: What make NASM slower than C ?
« Reply #2 on: September 05, 2010, 05:56:04 AM »
What makes Nasm slower than C? You do! Nasm does absolutely nothing to make your code any faster or any slower than what you write.

As to how to make the code faster... "fbild" and "fbstp" are dog slow, is the first thing that comes to mind. Why are you doing it that way? The mixture of 16- and 32-bit code can't help. Unless you're trapped in a medieval dungeon with nothing but a 286, write 32-bit code, for gossake! You appear to be doing more "fild"s than "fstp"s - does this even work? I'm surprised you don't overflow your FPU stack and crash! Is there a reason for the "fwait"s where you've got 'em?

Without the data declarations, it's hard to see just what you're trying to do. Looks like mostly integer arithmetic(?)... I haven't got time/inclination to wrestle with this right now. If you'll post the complete C code - something that will compile - I'll try to get around to... or you can do it yourself... compile it to an executable (mine will be for Linux), and disassemble it with Agner Fog's "objconv" ("-fnasm")...

http://www.agner.org/optimize

You'll have to scroll down a ways to get to your code (there's an amazing amount of "cruft" in a C program, but it doesn't seem to harm it). See what the compiler does that's so much faster. As an alternative, get the compiler to spit out assembly, and look at that. It may be hard to beat. Those compiler authors aren't dumb, and they've been at it for a while. They probably understand the FPU better than you and I do. But maybe we can do still better - no harm trying. Designing a good FPU algorithm isn't easy! FPU is perhaps "obsolete" anyway - maybe should be using SSE or so. I have no clue how that would work. If we even need FPU... looks like a lot of integer math to me...

But post something with the data declarations. I'm not going to guess.

Best,
Frank

P.S. I see Rob just said the same thing, more succinctly. :)