Author Topic: normalize on fpu (Read 27577 times)

grunge_fighter · « **on:** August 17, 2012, 02:42:03 PM »

Hullo, I am learning asm not from to many days
till now, and I am trying to write my own
asm foutine to normalize a vector of three floats

Here is what I have invented myself

Code: [Select]

_asm_normalize10:; Function begin
        push    ebp                                     ; 002E _ 55
        mov     ebp, esp                                ; 002F _ 89. E5
        mov     eax, dword [ebp+8H]                     ; 0031 _ 8B. 45, 08
        fld     dword [eax]                             ; 0034 _ D9. 00
        fmul    st0, st(0)                              ; 0036 _ DC. C8
        fld     dword [eax+4H]                          ; 0038 _ D9. 40, 04
        fmul    st0, st(0)                              ; 003B _ DC. C8
        fld     dword [eax+8H]                          ; 003D _ D9. 40, 08
        fmul    st0, st(0)                              ; 0040 _ DC. C8
        faddp   st1, st(0)                              ; 0042 _ DE. C1
        faddp   st1, st(0)                              ; 0044 _ DE. C1
        fsqrt                                           ; 0046 _ D9. FA
        fld1                                            ; 0048 _ D9. E8
        fdivrp  st1, st(0)                              ; 004A _ DE. F1
        fld     dword [eax]                             ; 004C _ D9. 00
        fmul    st(0), st1                              ; 004E _ D8. C9
        fstp    dword [eax]                             ; 0050 _ D9. 18
        fld     dword [eax+4H]                          ; 0052 _ D9. 40, 04
        fmul    st(0), st1                              ; 0055 _ D8. C9
        fstp    dword [eax+4H]                          ; 0057 _ D9. 58, 04
        fld     dword [eax+8H]                          ; 005A _ D9. 40, 08
        fmulp   st1, st(0)                              ; 005D _ DE. C9
        fstp    dword [eax+8H]                          ; 005F _ D9. 58, 08
        pop     ebp                                     ; 0062 _ 5D
        ret                                             ; 0063 _ C3
; _asm_normalize10 End of function

(about 90 cycles on my old (too old possibly ;-)
pentium 4 home machine)

How could it be improved

(I am interested mainly in pure FPU version,
but also SSE2 version (for my p4) would be later
welcome, and maybe also AVX tiptop version for
newest CPU also then [can be routine for calculating
4 or 8 normalizations at once]
(but nown mainly I am interested in solid old
FPU code)

TNX for help with that

Frank Kotler · « **Reply #1 on:** August 17, 2012, 03:47:32 PM »

Well, I got as far as putting "code" tags around it for ya ("code" in square brackets at the beginning and "/code" in square brackets at the end). Improves readability slightly... Looks pretty decent, at first glance, but I'll have to "study" it a bit. Got some other things to do right now, but I'll get back to it. Remind me if I don't.

Best,
Frank

grunge_fighter · « **Reply #2 on:** August 17, 2012, 04:16:09 PM »

Quote from: Frank Kotler on August 17, 2012, 03:47:32 PM

Well, I got as far as putting "code" tags around it for ya ("code" in square brackets at the beginning and "/code" in square brackets at the end). Improves readability slightly... Looks pretty decent, at first glance, but I'll have to "study" it a bit. Got some other things to do right now, but I'll get back to it. Remind me if I don't.

Best,
Frank

tnx for editing into code tags

decent? tnx, (think it is lame, You have not read it;)

I am not sure if it is allright, I just use only some mnemonics I know ;-)

I am thinking if maybe wrong is this one thing:

I load vectro's (x,y,z) at fpu stack at begin then mul it add sqr and rdiv it, then load vector's (x y z) the second time (from the ram) and multiply it by rdiv result - then store

load (x y z)
calc
load (x y z)
calc
store (x y z)

- It is correct ?, maybe I should better 'save it' down on fpu stack to not read it from ram the second time

load (x y z)
calc
store (x y z)

Frank Kotler · « **Reply #3 on:** August 18, 2012, 10:52:01 PM »

Looks like the code from Harold over at SO would be an improvement.

http://stackoverflow.com/questions/12018476/asm-fpu-normalize-how-to-optimize-it

Watch out for... well, stack overflow.

Overflowing the FPU stack will crash your program, AFAIK.

I don't know how to do SSE. Might be a candidate. This would probably make a good exercize/example for that(?).

I still think you did a good job with it!

Best,
Frank

grunge_fighter · « **Reply #4 on:** August 19, 2012, 09:29:52 AM »

Quote from: Frank Kotler on August 18, 2012, 10:52:01 PM

Looks like the code from Harold over at SO would be an improvement.

.....

I still think you did a good job with it!

Best,
Frank

will do better ;-)

Frank Kotler · « **Reply #5 on:** August 19, 2012, 05:46:29 PM »

That's the spirit!

For those who want to follow along, this question is also being discussed over at Stack Overflow...

http://stackoverflow.com/questions/12027540/carmacks-invsqrt-in-asm

You're getting good advice there, but not always in Nasm syntax. For example, Nasm doesn't use "ptr" - you can just leave it out. Nasm doesn't use "offset" either - you can just leave that out, too, but where other assemblers DON'T use "offset" you may need to add "[]" to get the same effect in Nasm.

We can discuss these details here, or there, or both...

Edit: for those not familiar with "Carmack's trick" (like me) this may help:

http://en.wikipedia.org/wiki/Fast_inverse_square_root

Best,
Frank

NASM - The Netwide Assembler

News:

Author Topic: normalize on fpu (Read 27577 times)

grunge_fighter

normalize on fpu

Frank Kotler

Re: normalize on fpu

grunge_fighter

Re: normalize on fpu

Frank Kotler

Re: normalize on fpu

grunge_fighter

Re: normalize on fpu

Frank Kotler

Re: normalize on fpu