Author Topic: 80x87 calculations are more "precise" than SSE/AVX?  (Read 1896 times)

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
80x87 calculations are more "precise" than SSE/AVX?
« on: June 25, 2023, 02:46:08 PM »
Simple answer: YES, they are.

Fp87, by default, uses extended precision, which has 64 bits of precision. IEEE-754 has 3 standard precision structures: single precision (24 bits of precision), double precision (53 bits) and extended precision (64 bits). SSE/AVX deals only with the first two. Fp87 deals with the third all the time (by default - you can change that). When you load a single precision value it will be converted to extended precision, internally. The same goes for double precision... And when you store a value in single or double precision, conversions are made too.

This is useful. Take a look at these two routines in ASM:
Code: [Select]
; one.asm
  bits  64
  default rel

  section .text

  global oneF87
  global oneSSE2

  align 4
oneF87:
  fld   qword [tenth]
  mov   ecx,9
.loop:
  fadd  qword [tenth]
  dec   ecx
  jnz   .loop 

  fstp  qword [rsp-8]
  movsd xmm0,[rsp-8]

  ret

oneSSE2:
  movsd xmm0,[tenth]
  mov   ecx,9
.loop:
  addsd xmm0,[tenth]
  dec   ecx
  jnz   .loop
  ret

  section .rodata

tenth:
  dq    0.1
Code: [Select]
// test.c
#include <stdio.h>

extern double oneF87( void );
extern double oneSSE2( void );

int main( void )
{
  static const char *yesno[] = { "no", "yes" };
  double a, b;

  a = oneF87();
  b = oneSSE2();

  printf ( "oneF87()  == 1.0? [%s]\n"
           "oneSSE2() == 1.0? [%s]\n",
           yesno[ a == 1.0 ], yesno[ b == 1.0 ] );
}
Code: [Select]
$ nasm -felf64 -o one.o one.asm
$ cc -O2 -c -o test test.c
$ cc -s -o test test.o one.o
$ ./test
oneF87()  == 1.0? [yes]
oneSSE2() == 1.0? [no]
What's going on? Why the fp87 code says the sum of 10 0.1 is exactly 1.0 and SSE code, which does the same thing, says it isn't!?
That's because fp87 code do operations in extended precision... the result isn't 1.0, but when converted to double the error is truncated (by coincidence). SSE2 deals, here, with scalar doubles directly (this is explicit in the instructions suffix 'sd').

Strangely, fp87 offers you a FALSE value... it is impossible to sum 10 0.1 values and get exactly 1.0 in floating point.

This doesn't mean you can't take advantage of greater precision. 0.1 in extended precision is .099999999999999999999661186821098279864372670999728143215179443359375, in decimal. Since double precision has, roughly, 16 decimal algarisms of precision, the final rounded value, calculated in extended precision, is something as 1.0000000000000000000xxxxx., where xxxx is the error. This is a 19 significant digits value that will be truncated to double as 1.0. But, notice... this isn't the "real" floating point result from addind 0.1 (double) 10 times...

The actual final values are: 0.99999999999999988897769753748434595763683319091796875 (double) and 1.000000000000000055511151231257827021181583404541015625 (long double), the later just a little bit off from becoming 1.0000000000001.

So, yes... fp87 offers better precision for calculations, but beware! This is not a panacea for better results.
« Last Edit: June 25, 2023, 02:47:42 PM by fredericopissarra »

Offline munair

  • Jr. Member
  • *
  • Posts: 37
  • Country: nl
  • SharpBASIC compiler developer
    • SharpBASIC
Re: 80x87 calculations are more "precise" than SSE/AVX?
« Reply #1 on: July 20, 2023, 07:00:27 AM »
Thank you for sharing! I still have to add support for real numbers to the SharpBASIC compiler and there is a lot to consider such as the default instruction set. Your information is helpful.
SharpBASIC (www.sharpbasic.com) is a compiler in development that uses NASM as backend.