What makes Nasm slower than C? You do! Nasm does absolutely nothing to make your code any faster or any slower than what you write.
As to how to make the code faster... "fbild" and "fbstp" are dog slow, is the first thing that comes to mind. Why are you doing it that way? The mixture of 16- and 32-bit code can't help. Unless you're trapped in a medieval dungeon with nothing but a 286, write 32-bit code, for gossake! You appear to be doing more "fild"s than "fstp"s - does this even work? I'm surprised you don't overflow your FPU stack and crash! Is there a reason for the "fwait"s where you've got 'em?
Without the data declarations, it's hard to see just what you're trying to do. Looks like mostly integer arithmetic(?)... I haven't got time/inclination to wrestle with this right now. If you'll post the complete C code - something that will compile - I'll try to get around to... or you can do it yourself... compile it to an executable (mine will be for Linux), and disassemble it with Agner Fog's "objconv" ("-fnasm")...
http://www.agner.org/optimizeYou'll have to scroll down a ways to get to your code (there's an amazing amount of "cruft" in a C program, but it doesn't seem to harm it). See what the compiler does that's so much faster. As an alternative, get the compiler to spit out assembly, and look at that. It may be hard to beat. Those compiler authors aren't dumb, and they've been at it for a while. They probably understand the FPU better than you and I do. But maybe we can do still better - no harm trying. Designing a good FPU algorithm isn't easy! FPU is perhaps "obsolete" anyway - maybe should be using SSE or so. I have no clue how that would work. If we even need FPU... looks like a lot of integer math to me...
But post something with the data declarations. I'm not going to guess.
Best,
Frank
P.S. I see Rob just said the same thing, more succinctly.