Author Topic: crc32 instruction  (Read 18587 times)

nobody

  • Guest
crc32 instruction
« on: November 03, 2008, 04:12:51 PM »
I posted the question about call/jmp relative offsets earlier.

I just went through and implemented all the latest SSE instructions in my code, and am comparing the machine code I generate against NASM.  I'm seeing different/unexpected behavior from NASM -- doesn't seem to follow the Intel manuals.

I think the opcode is supposed to be 0x0F 0x38 0xF0 (8bit src operand) or 0xF1 (16/32/64bit src operand), with an 0xF2 prefix.  For every single documented operand combination, NASM isn't generating what I would expect.  Below I've listed the test instructions I'm generating, along with my (corepy) and NASM's output for a few of the cases.

crc32 r12, r12

nasm output:    f24d0f380166e4
corepy output:  f24d0f38f1e4

crc32 r12, qword [r12 + 32]

nasm output:    f24d0f380166642420
corepy output:  f24d0f38f1642420

crc32 r12, byte [rbp + -8]

nasm output:    f24c0f380165f8
corepy output:  f24c0f38f065f8

Which is correct?  If it is NASM, why, and where is the documentation backing it up?

I'd test this on hardware, but I don't have access to any machines with SSE 4.2.  Also I do realize some of the instructions/operands above don't make practical sense -- these are just tests used to verify machine code output.

Thanks!

Offline H. Peter Anvin

  • NASM Developer
  • Jr. Member
  • *****
  • Posts: 18
Re: crc32 instruction
« Reply #1 on: December 10, 2008, 06:35:49 PM »
The CRC32 instruction was broken; it is fixed in 2.06rc1.