NASM - The Netwide Assembler
NASM Forum => Programming with NASM => Topic started by: grunge_fighter on August 17, 2012, 02:42:03 PM
-
Hullo, I am learning asm not from to many days
till now, and I am trying to write my own
asm foutine to normalize a vector of three floats
Here is what I have invented myself
_asm_normalize10:; Function begin
push ebp ; 002E _ 55
mov ebp, esp ; 002F _ 89. E5
mov eax, dword [ebp+8H] ; 0031 _ 8B. 45, 08
fld dword [eax] ; 0034 _ D9. 00
fmul st0, st(0) ; 0036 _ DC. C8
fld dword [eax+4H] ; 0038 _ D9. 40, 04
fmul st0, st(0) ; 003B _ DC. C8
fld dword [eax+8H] ; 003D _ D9. 40, 08
fmul st0, st(0) ; 0040 _ DC. C8
faddp st1, st(0) ; 0042 _ DE. C1
faddp st1, st(0) ; 0044 _ DE. C1
fsqrt ; 0046 _ D9. FA
fld1 ; 0048 _ D9. E8
fdivrp st1, st(0) ; 004A _ DE. F1
fld dword [eax] ; 004C _ D9. 00
fmul st(0), st1 ; 004E _ D8. C9
fstp dword [eax] ; 0050 _ D9. 18
fld dword [eax+4H] ; 0052 _ D9. 40, 04
fmul st(0), st1 ; 0055 _ D8. C9
fstp dword [eax+4H] ; 0057 _ D9. 58, 04
fld dword [eax+8H] ; 005A _ D9. 40, 08
fmulp st1, st(0) ; 005D _ DE. C9
fstp dword [eax+8H] ; 005F _ D9. 58, 08
pop ebp ; 0062 _ 5D
ret ; 0063 _ C3
; _asm_normalize10 End of function
(about 90 cycles on my old (too old possibly ;-)
pentium 4 home machine)
How could it be improved
(I am interested mainly in pure FPU version,
but also SSE2 version (for my p4) would be later
welcome, and maybe also AVX tiptop version for
newest CPU also then [can be routine for calculating
4 or 8 normalizations at once]
(but nown mainly I am interested in solid old
FPU code)
TNX for help with that
-
Well, I got as far as putting "code" tags around it for ya ("code" in square brackets at the beginning and "/code" in square brackets at the end). Improves readability slightly... Looks pretty decent, at first glance, but I'll have to "study" it a bit. Got some other things to do right now, but I'll get back to it. Remind me if I don't. :)
Best,
Frank
-
Well, I got as far as putting "code" tags around it for ya ("code" in square brackets at the beginning and "/code" in square brackets at the end). Improves readability slightly... Looks pretty decent, at first glance, but I'll have to "study" it a bit. Got some other things to do right now, but I'll get back to it. Remind me if I don't. :)
Best,
Frank
tnx for editing into code tags
decent? tnx, (think it is lame, You have not read it;)
I am not sure if it is allright, I just use only some mnemonics I know ;-)
I am thinking if maybe wrong is this one thing:
I load vectro's (x,y,z) at fpu stack at begin then mul it add sqr and rdiv it, then load vector's (x y z) the second time (from the ram) and multiply it by rdiv result - then store
load (x y z)
calc
load (x y z)
calc
store (x y z)
- It is correct ?, maybe I should better 'save it' down on fpu stack to not read it from ram the second time
load (x y z)
calc
store (x y z)
???
-
Looks like the code from Harold over at SO would be an improvement.
http://stackoverflow.com/questions/12018476/asm-fpu-normalize-how-to-optimize-it
Watch out for... well, stack overflow. :) Overflowing the FPU stack will crash your program, AFAIK.
I don't know how to do SSE. Might be a candidate. This would probably make a good exercize/example for that(?).
I still think you did a good job with it!
Best,
Frank
-
Looks like the code from Harold over at SO would be an improvement.
.....
I still think you did a good job with it!
Best,
Frank
will do better ;-)
-
That's the spirit!
For those who want to follow along, this question is also being discussed over at Stack Overflow...
http://stackoverflow.com/questions/12027540/carmacks-invsqrt-in-asm
You're getting good advice there, but not always in Nasm syntax. For example, Nasm doesn't use "ptr" - you can just leave it out. Nasm doesn't use "offset" either - you can just leave that out, too, but where other assemblers DON'T use "offset" you may need to add "[]" to get the same effect in Nasm.
We can discuss these details here, or there, or both...
Edit: for those not familiar with "Carmack's trick" (like me) this may help:
http://en.wikipedia.org/wiki/Fast_inverse_square_root
Best,
Frank