Author Topic: Support for FISTTP? (Read 19121 times)

Rookie · « **on:** November 26, 2018, 01:26:28 PM »

Hi!

I am trying to assemble code with FISTTP (integer store truncate and pop). Apparently it was introduced with SSE3.

I am currently hard-coding
FISTTP QWORD[RAX]
as
DW 8DDh
since I get a "error: no instruction for this cpu level" message.

What CPU level do I need?

dreamCoder · « **Reply #1 on:** November 26, 2018, 04:38:53 PM »

Without codes or any background information, can't tell you much about your problem. But testing this on i7, Win10 and it works.

Code: [Select]

        extern printf

        section .data
x:      dq 18.7653
format: db '%llu',0ah,0

        section .text
        global main
main:
        sub     rsp,40
        finit
        mov     rax,x
        fld     qword[rax]
        db      0xdd
        db      0x08
        mov     rdx,[x]     ;rsi
        mov     rcx,format  ;rdi
        call    printf
        add     rsp,40
        ret

But I don't understand the need to use hex-encoding of instructions. If it doesn't run pre-Prescott, hard-coding it like that won't run either. Any particular reasons? Perhaps u should update your NASM.

Rookie · « **Reply #2 on:** November 26, 2018, 07:20:34 PM »

There are two versions of the macro. The first is the ideal, and the second is the work-around going through rax.

Code: [Select]

;---------------------------------------------- froundz
; st0 <- st0 rounded towards zero at non-zero st1 intervals
%macro	froundz 0		; x,|n|
	fdiv	st0,st1		; xi=x/|n|,|n|
	sub	rsp,8
	fisttp	qword[rsp]
	fild	qword[rsp]	; Trunc(xi),|n|
	fmul			; Trunc(xi)|n|
	add	rsp,8
%endmacro
;---------------------------------------------- froundz
; st0 <- st0 rounded towards zero at non-zero st1 intervals
%macro	froundz2 0		; x,|n|
	fdiv	st0,st1		; xi=x/|n|,|n|
	sub	rsp,8
	mov	rax,rsp

	;fisttp	qword[rax]	; 0x08DD is the rax opcode
	dw	8DDh		; |n|

	fild	qword[rax]	; Trunc(xi),|n|
	fmul			; Trunc(xi)|n|
	add	rsp,8
%endmacro

The assembly command gives the following output:

Code: [Select]


Assemble.cmd
------------
Copyright (c) 1997  Analytical Logic
All rights reserved

Assembling project: FastMath ...

32-bit Borland Assembly ... Failed!

FastMath.32.asm:618: error: no instruction for this cpu level
FastMath.inc:796: ... from macro `froundz' defined here


32-bit Microsoft Assembly ... Failed!

FastMath.32.asm:618: error: no instruction for this cpu level
FastMath.inc:796: ... from macro `froundz' defined here


64-bit Borland Assembly ... Ok

64-bit Microsoft Assembly ... Ok

64-bit Microsoft DLL ... Ok

Moving objects:

        FastMath.x64.obj ... Ok
        FastMath.o ... Ok
        FastMath.dll ... Ok

C:\Source\Analog\FastMath\Nasm>

Nasm is 2.12.
I omitted to mention that it is only the 32-bit assemble that throws the error and requires the hard-coded opcode.

dreamCoder · « **Reply #3 on:** November 26, 2018, 08:37:07 PM »

Rookie, is that 64-bit addressing mode "qword[rax]" compiled in a 32-bit code? Shouldn't it be "qword[eax]" instead?

If FISTTP is not supported, then you could use one of two methods

1) frndint (rouding to zero). But I bet you already knew this.
2) Fxtract. (Get the exponent part in ST1). Then use that information to SHL and SHR the float in question. Exponent part shows the range or offset bits of your integer bit, from 12 bits (EDIT: from the left).

Rookie · « **Reply #4 on:** November 26, 2018, 10:59:17 PM »

FastMath.inc is written purely in 64-bit.
FastMath.32.asm handles the 32-bit-specific prologue/epilogue and maps rax -> eax etc (so the code maps to FISTTP QWORD[EAX]).

The library expressly avoids FXTRACT(13), FPREM(16-64) and FSCALE(20-31) because of their dismal performance. FRNDINT(9-20) is out of the question because of the required control word change (and it's slow - worst-case FISTP/FILD with read-after-write penalty is as fast).

The 32-bit libraries work fine with the hard-coded opcode. So the hardware supports the instruction, it's just the 32-bit assembly that rejects it.

SOLVED: I had to specify CPU PRESCOTT instead of CPU 686!

dreamCoder · « **Reply #5 on:** November 26, 2018, 11:45:21 PM »

Owh, ok. Glad I mentioned Prescott

But still the question remains. If it is solved by setting the minimum CPU to Prescott, then there's no apparent need to hard-code it anymore because Prescott onwards does support FISTTP instruction (I think). But as long as it is solved, I am happy for you.

Rookie · « **Reply #6 on:** November 27, 2018, 12:05:53 AM »

Thanks for the help!

CPU PRESCOTT allows the FISTTP QWORD[rsp] (preferred) to assemble so I was able to remove the hard-coded opcode.

NASM - The Netwide Assembler

News:

Author Topic: Support for FISTTP? (Read 19121 times)

Rookie

Support for FISTTP?

dreamCoder

Re: Support for FISTTP?

Rookie

Re: Support for FISTTP?

dreamCoder

Re: Support for FISTTP?

Rookie

Re: Support for FISTTP?

dreamCoder

Re: Support for FISTTP?

Rookie

Re: Support for FISTTP?