Recent Posts

Pages: 1 ... 3 4 [5] 6 7 ... 10
41
Using NASM / Re: How do you overide the address size in NASM when using movsb?
« Last post by debs3759 on December 17, 2023, 05:09:35 PM »
Yes, it's in the manual, that's what I checked before replying. In version 2.07 of the manual, it's described in section 10.3. In version 0.98, it was section 9.3. They're the two version I currently use for reference.

I think the REP would be the first prefix, so you repeat the o32 each time round. Not sure on that though, it may work the other way round as well.
42
Using NASM / Re: Wrong far jump address generated by NASM 2.16.1
« Last post by ben321 on December 17, 2023, 02:01:58 AM »
Quote

As you can see, both instructions produce the same opcodes; using NASM 2.16.1, instead, this is the result:
Code: [Select]
EA0A000000        jmp word 0x0:0xa
EA0A000A00        jmp word 0xa:0xa
The address in the second instruction is clearly wrong!


Does this bug still exist in the latest RC (release candidate) version?
Guess when I first looked at it hundreds of years ago it looked like both of the instructions had distinct opcodes because they were in fact not the same haha (joke)  :)
43
Using NASM / Re: How do you overide the address size in NASM when using movsb?
« Last post by ben321 on December 17, 2023, 01:50:36 AM »
You can use the a32 prefix to override the address size. In your case, that would be

Code: [Select]
a32 movsb

Similarly, to use 16 bit addressing in 32 bit code, you can use the a16 prefix

To override operand size, it's o16 and o32

Thanks for the info. Is that somewhere in the manual? Or more of a hidden function?

Also, movsb is usually used with rep, so where does the a32 go in this case? Before or after the rep?
44
Interesting. I need large precision for doing things like highly detailed mandlebrot fractal generation. The higher the precision the better. This involves complex multiplication and addition.

One complex add or subtract uses 2 fp adds or subtracts.
One complex multiply uses 4 fp multiplies, 1 fp add, and 1 fp subtract.
One complex squared performs a complex multiply with both inputs being the same complex value.
Magnitude of a complex number uses 2 fp multiplies, 1 fp add, and 1 fp square root.

These functions are all used, over many times over (due to the iterative nature of the mandelbrot equation), for calculating each pixel of a mandelbrot image.
45
The transcendental operations in the x87 are very slow operations; they actually result in the CPU to execute on the order of 100 "micro" instructions, doing the same kind of approximation as one would do with SIMD instructions.  There is only a small inherent performance advantage in having a single "macro" instruction for the sequence of "micro" instructions.
There are some discussions on stackoverflow that suggest that modern SIMD based implementations can indeed be faster (see e.g. https://stackoverflow.com/questions/2683588/what-is-the-fastest-way-to-compute-sin-and-cos-together).
(As always, perform your own measurements...)
Note however, that actual performance of various implementations (whether in hardware or in software) is often hard to compare as they provide different precision, and different ranges on which they work properly.  In general, supporting less precision over a smaller range will mean larger performance.
46
Transcendental functions actually can be computed (i.e. approximated up to an arbitrary desired precision) using only those basic operations; efficient algorithms to do so are not particularly simple though, and are best left to be implemented by experts in that area.  As such, you should probably just use a library that has implementations for these.  A library written (in C) to compute these using SIMD operations can be found at https://sleef.org/.

If you need to do tons of arithmetic to get it to work though, is it actually any faster than the native x87 fsin and fcos instructions.
47
Using NASM / Re: How do you overide the address size in NASM when using movsb?
« Last post by debs3759 on December 16, 2023, 05:38:46 PM »
You can use the a32 prefix to override the address size. In your case, that would be

Code: [Select]
a32 movsb

Similarly, to use 16 bit addressing in 32 bit code, you can use the a16 prefix

To override operand size, it's o16 and o32
48
Transcendental functions actually can be computed (i.e. approximated up to an arbitrary desired precision) using only those basic operations; efficient algorithms to do so are not particularly simple though, and are best left to be implemented by experts in that area.  As such, you should probably just use a library that has implementations for these.  A library written (in C) to compute these using SIMD operations can be found at https://sleef.org/.
49
Using NASM / How do you overide the address size in NASM when using movsb?
« Last post by ben321 on December 16, 2023, 08:25:18 AM »
For example, in 16 bit mode, the movsb command looks only at the si and di registers (first 16 bits of the esi and edi registers). However I can use "db 0x67" to manually insert the address size override byte in front of the movsb opcode. This will force it to use the entire 32bits of the esi and edi registers. However, I'm wondering if there's a proper instruction for this. I've tried "movsb long", and "movsb far". And so far I can't find any instruction that will cause NASM to write the 0x67 byte in front of the opcode for movsb.
50
There are no math instructions other then the four basic arithmetic functions (add, substraction, multiplication and division) with SSE (or almost all SIMD in other processors as well).

This is not a bad thing, since fsin and fcos are very slow (50-120 cycles).

Then how does Intel expect you to run any useful algorithms (which often require transcendental functions like sine or cosine) with the XMM registers (what they want you to use when in 64bit mode)? You can't do sine and cosine simply by using a combination of add, subtract, multiply, and divide.
Pages: 1 ... 3 4 [5] 6 7 ... 10