Recent Posts

Pages: 1 ... 3 4 [5] 6 7 ... 10
41
If you want to do deep zoom levels for mandelbrot, you'll eventually need more precision than any hardware implementation provides.  In that case, you'll need some arbitrary precision floating point support.  You can write your own (with a lot of effort), or get an existing package, e.g. gnu mpfr that provides such functionality.
There is going to be a big performance penalty for doing so; but you won't have a choice if you need the extra precision.
A good implementation with zoom capabilities will probably start using ordinary float/double on low zoom levels, and only when larger precision is needed, switch over to some higher precision implementation, e.g. first to using quad floats (which AFAIK are available in sleef), and then ultimately to an arbitrary precision package like mpfr, selecting a suitable precision for the zoom level of interest.
42
Using NASM / Re: How do you overide the address size in NASM when using movsb?
« Last post by debs3759 on December 17, 2023, 05:09:35 PM »
Yes, it's in the manual, that's what I checked before replying. In version 2.07 of the manual, it's described in section 10.3. In version 0.98, it was section 9.3. They're the two version I currently use for reference.

I think the REP would be the first prefix, so you repeat the o32 each time round. Not sure on that though, it may work the other way round as well.
43
Using NASM / Re: Wrong far jump address generated by NASM 2.16.1
« Last post by ben321 on December 17, 2023, 02:01:58 AM »
Quote

As you can see, both instructions produce the same opcodes; using NASM 2.16.1, instead, this is the result:
Code: [Select]
EA0A000000        jmp word 0x0:0xa
EA0A000A00        jmp word 0xa:0xa
The address in the second instruction is clearly wrong!


Does this bug still exist in the latest RC (release candidate) version?
Guess when I first looked at it hundreds of years ago it looked like both of the instructions had distinct opcodes because they were in fact not the same haha (joke)  :)
44
Using NASM / Re: How do you overide the address size in NASM when using movsb?
« Last post by ben321 on December 17, 2023, 01:50:36 AM »
You can use the a32 prefix to override the address size. In your case, that would be

Code: [Select]
a32 movsb

Similarly, to use 16 bit addressing in 32 bit code, you can use the a16 prefix

To override operand size, it's o16 and o32

Thanks for the info. Is that somewhere in the manual? Or more of a hidden function?

Also, movsb is usually used with rep, so where does the a32 go in this case? Before or after the rep?
45
Interesting. I need large precision for doing things like highly detailed mandlebrot fractal generation. The higher the precision the better. This involves complex multiplication and addition.

One complex add or subtract uses 2 fp adds or subtracts.
One complex multiply uses 4 fp multiplies, 1 fp add, and 1 fp subtract.
One complex squared performs a complex multiply with both inputs being the same complex value.
Magnitude of a complex number uses 2 fp multiplies, 1 fp add, and 1 fp square root.

These functions are all used, over many times over (due to the iterative nature of the mandelbrot equation), for calculating each pixel of a mandelbrot image.
46
The transcendental operations in the x87 are very slow operations; they actually result in the CPU to execute on the order of 100 "micro" instructions, doing the same kind of approximation as one would do with SIMD instructions.  There is only a small inherent performance advantage in having a single "macro" instruction for the sequence of "micro" instructions.
There are some discussions on stackoverflow that suggest that modern SIMD based implementations can indeed be faster (see e.g. https://stackoverflow.com/questions/2683588/what-is-the-fastest-way-to-compute-sin-and-cos-together).
(As always, perform your own measurements...)
Note however, that actual performance of various implementations (whether in hardware or in software) is often hard to compare as they provide different precision, and different ranges on which they work properly.  In general, supporting less precision over a smaller range will mean larger performance.
47
Transcendental functions actually can be computed (i.e. approximated up to an arbitrary desired precision) using only those basic operations; efficient algorithms to do so are not particularly simple though, and are best left to be implemented by experts in that area.  As such, you should probably just use a library that has implementations for these.  A library written (in C) to compute these using SIMD operations can be found at https://sleef.org/.

If you need to do tons of arithmetic to get it to work though, is it actually any faster than the native x87 fsin and fcos instructions.
48
Using NASM / Re: How do you overide the address size in NASM when using movsb?
« Last post by debs3759 on December 16, 2023, 05:38:46 PM »
You can use the a32 prefix to override the address size. In your case, that would be

Code: [Select]
a32 movsb

Similarly, to use 16 bit addressing in 32 bit code, you can use the a16 prefix

To override operand size, it's o16 and o32
49
Transcendental functions actually can be computed (i.e. approximated up to an arbitrary desired precision) using only those basic operations; efficient algorithms to do so are not particularly simple though, and are best left to be implemented by experts in that area.  As such, you should probably just use a library that has implementations for these.  A library written (in C) to compute these using SIMD operations can be found at https://sleef.org/.
50
Using NASM / How do you overide the address size in NASM when using movsb?
« Last post by ben321 on December 16, 2023, 08:25:18 AM »
For example, in 16 bit mode, the movsb command looks only at the si and di registers (first 16 bits of the esi and edi registers). However I can use "db 0x67" to manually insert the address size override byte in front of the movsb opcode. This will force it to use the entire 32bits of the esi and edi registers. However, I'm wondering if there's a proper instruction for this. I've tried "movsb long", and "movsb far". And so far I can't find any instruction that will cause NASM to write the 0x67 byte in front of the opcode for movsb.
Pages: 1 ... 3 4 [5] 6 7 ... 10