Author Topic: Is this the correct machine code for the instruction?  (Read 6880 times)

Offline ben321

  • Full Member
  • **
  • Posts: 185
Is this the correct machine code for the instruction?
« on: February 13, 2022, 08:18:16 AM »
So I was documenting the different machine codes for the ASM instructions by trying one and systematically changing just one part of the instruction and then assembling into a flat binary file and looking at the hexadecimal numbers for each of the machine code instructions. I noticed something interesting about the instructions in which the destination operand was the memory address stored a register. The below code is what I've tested so far (I commented out previously tested instructions so that each instruction I tried wouldn't get lost in a haystack of other instructions when viewed in a hex editor).
Code: [Select]
;--------- Move Immediate Value to Register-Pointed Memory Location --------

;mov byte [edi],0x12 ; C6 07 12
;mov byte [esi],0x12 ; C6 06 12
;mov byte [ebp],0x12 ; C6 45 00 12
;mov byte [esp],0x12 ; C6 04 24 12
;mov byte [ebx],0x12 ; C6 03 12
;mov byte [edx],0x12 ; C6 02 12
;mov byte [ecx],0x12 ; C6 01 12
;mov byte [eax],0x12 ; C6 00 12

When the register is anything other than EBP or ESP, the byte after the opcode byte has 0 (zero) in the first nibble and the register number in the second nibble, which is then followed by the raw bytes of data that constitute the immediate value to be stored. However, when the register is ESP, there's also an extra byte after the register specifier byte, and it contains the value 0x24. And when the register is EBP, it gets even stranger. With EBP, the register specifier byte has 4 in its first nibble instead of 0 (but it still has the register number in the second nibble). Also when the register is EBP (like with ESP), it also has an extra byte between the register specifier byte and the immediate data, but with EBP the value of this byte is 0 instead of 0x24. Why do ESP and EBP get these special treatments with this instruction?
« Last Edit: February 13, 2022, 08:41:00 AM by ben321 »

Offline debs3759

  • Global Moderator
  • Full Member
  • *****
  • Posts: 224
  • Country: gb
    • GPUZoo
Re: Is this the correct machine code for the instruction?
« Reply #1 on: February 13, 2022, 07:39:58 PM »
That is correct. From the v0.98 manual:

MOV r/m8,imm8 ; C6 /0 ib [8086]

And:

B.2.5 Effective Address Encoding: ModR/M and SIB
An effective address is encoded in up to three parts: a ModR/M byte, an optional SIB byte, and an
optional byte, word or doubleword displacement field.
The ModR/M byte consists of three fields: the mod field, ranging from 0 to 3, in the upper two bits
of the byte, the r/m field, ranging from 0 to 7, in the lower three bits, and the spare (register) field
in the middle (bit 3 to bit 5). The spare field is not relevant to the effective address being encoded,
and either contains an extension to the instruction opcode or the register value of another operand.
The ModR/M system can be used to encode a direct register reference rather than a memory access.
This is always done by setting the mod field to 3 and the r/m field to the register value of the
register in question (it must be a general-purpose register, and the size of the register must already
be implicit in the encoding of the rest of the instruction). In this case, the SIB byte and displacement
field are both absent.
In 16-bit addressing mode (either BITS 16 with no 67 prefix, or BITS 32 with a 67 prefix), the
SIB byte is never used. The general rules for mod and r/m (there is an exception, given below) are:
* The mod field gives the length of the displacement field: 0 means no displacement, 1 means one
byte, and 2 means two bytes.
* The r/m field encodes the combination of registers to be added to the displacement to give the
accessed address: 0 means BX+SI , 1 means BX+DI , 2 means BP+SI , 3 means BP+DI , 4 means
SI only, 5 means DI only, 6 means BP only, and 7 means BX only.
However, there is a special case:
* If mod is 0 and r/m is 6, the effective address encoded is not [BP] as the above rules would
suggest, but instead [disp16] : the displacement field is present and is two bytes long, and no
registers are added to the displacement.
Therefore the effective address [BP] cannot be encoded as efficiently as [BX] ; so if you code
[BP] in a program, NASM adds a notional 8-bit zero displacement, and sets mod to 1, r/m to 6,
and the one-byte displacement field to 0.
In 32-bit addressing mode (either BITS 16 with a 67 prefix, or BITS 32 with no 67 prefix) the
general rules (again, there are exceptions) for mod and r/m are:
* The mod field gives the length of the displacement field: 0 means no displacement, 1 means one
byte, and 2 means four bytes.
* If only one register is to be added to the displacement, and it is not ESP , the r/m field gives its
register value, and the SIB byte is absent. If the r/m field is 4 (which would encode ESP ), the
SIB byte is present and gives the combination and scaling of registers to be added to the
displacement.
If the SIB byte is present, it describes the combination of registers (an optional base register, and an
optional index register scaled by multiplication by 1, 2, 4 or 8) to be added to the displacement. The
SIB byte is divided into the scale field, in the top two bits, the index field in the next three, and
the base field in the bottom three. The general rules are:
* The base field encodes the register value of the base register.
* The index field encodes the register value of the index register, unless it is 4, in which case no
index register is used (so ESP cannot be used as an index register).
* The scale field encodes the multiplier by which the index register is scaled before adding it to
the base and displacement: 0 encodes a multiplier of 1, 1 encodes 2, 2 encodes 4 and 3 encodes 8.
The exceptions to the 32-bit encoding rules are:
* If mod is 0 and r/m is 5, the effective address encoded is not [EBP] as the above rules would
suggest, but instead [disp32] : the displacement field is present and is four bytes long, and no
registers are added to the displacement.
* If mod is 0, r/m is 4 (meaning the SIB byte is present) and base is 4, the effective address
encoded is not [EBP+index] as the above rules would suggest, but instead
[disp32+index] : the displacement field is present and is four bytes long, and there is no
base register (but the index register is still processed in the normal way).
My graphics card database: www.gpuzoo.com