Author Topic: How many bytes are supposed to be on the stack for RETF from 32bit real mode?  (Read 7831 times)

Offline ben321

  • Full Member
  • **
  • Posts: 185
Ok so I've been experimenting with 32bit protected mode, and am trying to get back to 16bit real mode. Of course when you clear CR0 register's protected mode bit you are back in a state somewhere between 32bit protected mode and 16bit real mode. I think this is called 32bit real mode. This state remains until you run a far instruction that sets the code's CS segment (which puts it back into 16bit real mode). In my case, since I got into protected mode with a far CALL instruction, I should be returning with a RETF instruction. However, while the initial call from 16bit real mode put 4 bytes on the stack (16bit segment and 16bit offset), the return to 16bit real mode (due to the RETF instruction being run in 32bit real mode), should be expecting there to be 6 bytes on the stack (16bit segment and 32bit offset). And I've already compensated for this by fixing the size of the values on the stack for the RETF instruction.

Now I've tested my code in DosBox, but something strange is going on. Instead of using only 6 bytes for the RETF instruction, it seems to be using 8 bytes (32bit segment and 32bit offset) instead. The result is that it kept getting my stack out of sync , even though it my code pointer at the right location after the RETF. It's like it popped 2 extra bytes off the stack that I'm not sure what they were even used for. This doesn't make any sense.  The segment part of an address is never a 32bit number. I managed to fix it by pushing the segment number on the stack as 32bit number (and of course keeping the offset as 32bits as well). But from my understanding, this extra fix (using 8 bytes on the stack for a 32bit RETF) shouldn't be needed. Is this just a DosBox bug, or is this behavior correct for real hardware too? Does the fact that the RETF instruction is being run from the strange 32bit real mode, and that the destination segment is a different bitness (16bit real mode), something that actually should cause the behavior I'm seeing?
« Last Edit: February 19, 2023, 09:43:45 PM by ben321 »

Offline ben321

  • Full Member
  • **
  • Posts: 185
I thought I found a solution, but I realized I hadn't. My above problem still remains, that the RETF from the 32bit real mode to the 16bit real mode is popping 8 bytes off the stack, instead of the 6 bytes that it should be.
« Last Edit: February 19, 2023, 11:41:26 PM by ben321 »

Offline ben321

  • Full Member
  • **
  • Posts: 185
Ok, I finally found a partial solution. It doesn't answer my question about why the RETF instruction seemed to be popping 8 bytes off the stack instead of 6, but it did seem to compensate. I needed to use an override byte in the far call from a 16bit mode to 32bit mode, or alternatively an override byte in the RETF instruction that takes you back to the 16bit mode from the 32bit mode. Note that when changing the far call, you also need to correctly set the size of the data that it uses as a pointer to the destination of the jump. So when making the far call, the pointer's offset value must be 32 bits in size (segment still 16 bits). This is in contrast to a 16bit offset when using a normal 16bit far call. You don't need to change this pointer's offset field size (and can keep it just a 16bit offset field), if you change the RETF to a RETFW, to force the far return instruction running in 32bit mode to behave like it would normally when running in 16bit mode.

Offline Deskman243

  • Jr. Member
  • *
  • Posts: 49
Yeah I was trying similarly to test code between privilege levels so I had designed a small label for a gate to try and reset the values by the GDT. Instead I made another segment for my code to go through the steps when I realized that my boot code couldn't make calls from the new real mode.Originally I was making a task state segment value and was happy that all the other functions worked in the section until I called from the IDT pipeline to truly verify the state between protected modes. Here is the source for this code.

Code: [Select]
%define BUILD_GDT_DESC(bounds,base,access,flags) \
((( base & 0x00FFFFFF) << 16) | \
(( base & 0xFF000000) << 32) | \
( bounds & 0x0000FFFF) | \
(( bounds & 0x000F0000) << 32) | \
(( access & 0xFF) << 40) | \
(( flags & 0x0F) << 52))

TSS_IO_MAP_SIZE EQU 0x400/8

[BITS 16]

section .data

ROWS EQU 25
COLS1 EQU 80

section .text

_prep_module2:
jmp _prep_module2_

Idtpipe32:
dw 0
dd 0



_prep_module2_:
cld
cli

in al,0x92
or al,2
out 0x92,al

; mov [saved_segment],ds

lgdt[gdt32Ptr]
lidt[Idtpipe32]

mov word [gate_voucher],GATE_VOUCHER

protectedGate:
mov eax,cr0
or eax,1
mov cr0,eax

jmp code32_post:protectedMode

;.pm_track:
; mov word [gate_voucher],GATE_CHECK
; jmp protectedGate

[bits 32]
trail:
jmp $
; mov sp,bp
; ret

protectedMode:
mov cx,word [gate_voucher]
cmp cx,0
jz trail

mov dword [saved_segment],data32_post
call task_prod

; mov ebp,tss32_post
; ltr bp

call task_poke

mov esp,edx
mov esp,PM_MODE_STACK

mov ah,COLOR_ATTR_PSC
mov al,ah
mov esi, pm_str

call printstr_pm

; jmp $

prep_stage:
mov ebp,tss

mov edx,0
mov ecx,0
; mov ebp,0
.prep_loop1:
mov es:bp,dx
add bp,1
cmp bp,0x8298
; cmp bp,0x8319
jnz .prep_loop1

; jmp prototype2

.prep_stage2:
mov word [tss.iomap_base],tss.iomap - tss
mov ebp,tss32_post
ltr bp

_prep_stage1:
; lgdt[gdt16Ptr]
jmp rm_poll_function
; jmp 0:rm_poll_function

;[bits 16]
;use16
;testing states
rm_poll_function:
;use16
cli
xor ecx,ecx
mov ebp,esp

;rmGate:
; jmp 0:rm_poll_state
; jmp cs:rm_poll_state

; use16
; jmp rmGate
; jmp 0:rmGate
jmp code16_post:rmGate


[bits 16]
rm_trail:
ret


rmGate:

mov ecx,data16_post
mov es,cx
; mov cs,cx
mov ds,cx
mov fs,cx
mov gs,cx
mov ss,cx

mov eax,cr0
mov [save_cr0],eax
and eax,07ffffffeh
mov cr0,eax


; lgdt[gdt32Ptr]
; lidt[Idtpipe]

; mov word [gate_voucher],GATE_VOUCHER
; push word [gate_voucher]
; pop cx
mov cx,1

cmp cx,0
jz rm_trail

; jmp code16_post:rm_poll_state
jmp 0:rm_poll_state

; push cs
; push rm_poll_state
; retf
Idtpipe:
dw 0x3ff
dd 0

Idtpipe2:
dw 0xffff
dd 0

rm_poll_state:



xor eax,eax
xor ecx,ecx

; mov sp,0x0000_9000
mov esp,0x00009000
; mov sp,0x9000


; mov cx,0
mov es,cx
; mov cs,cx
mov ss,cx
mov ds,cx
mov fs,cx
mov gs,cx
; mov sp,0x0000_9000
lidt[Idtpipe2]
sti


jmp retstack
; jmp prototype

mov dword [saved_segment],data16_post
mov word [gate_voucher],GATE_CHECK

call task_prod16
; call task_poke16
call poll_function1

call color_function
; jmp retstack
; jmp prototype

; jmp protectedGate2
; jmp protectedGate.pm_track
jmp _prep_module3

; jmp $

retstack:
; pushf
push 0x0000
; push byte data16_post
; push byte code16_post
; push byte Idtpipe
push word color_function
retf
; retfw

prototype:
xor ax,ax
mov ds,ax
mov ax,[0x40]
mov dx,[0x42]
mov [pre],ax
mov [pre+2],dx
mov word [0x40],handler
mov [0x42],cs
mov ah,0x0e
mov al,'h'
int 0x10
jmp $

handler:
inc ax
jmp far [pre]
pre: dd 0

poll_function1:
.prep:

xor ebx,ebx
xor edx,edx

; mov ax,0
mov cx,0
mov es,cx
mov ax,ROWS
mov bx,es:0x044A
mul bx
add ax,COLS1
shl ax,1
mov si,ax

; mov cx,0B000H
; test byte es:0x0410,2
; jnz .Address_OK
mov cx,0B800H

.Address_OK:
mov es,cx
mov al,'H'

mov es:[si],byte al
; mov es:0,byte al
mov es:si,byte al
mov ds:si,byte al
mov ds:[si],byte al
inc si
ret
; jmp $

loopcheck:
; hlt
jmp loopcheck


color_function:
  ; set base pointer to heap
  ; and stack pointer to stack for functions and variables
mov dx,sp

  mov ax,0x13
  int 0x10
 
mov cx,0xA000
mov es,cx

mov word[X],50
mov word[Y],60
call color_method

mov word[X],100
mov word[Y],60
  call color_method

mov word[X],120
mov word[Y],120
  call color_method 
mov word[X],140
mov word[Y],30
  call color_method
mov sp,dx
; hlt
ret
color_method :
  mov ax, word[X]
  mov bx, word[Y]
  xor si, si
add si,word 320
  imul si, bx
  add si, ax
  mov cx, word[color_pkg]
  mov [es:si], cx
 
  ret



task_poke16:

; mov word [gate_voucher],1
;check wizard stall
mov cx,word [gate_voucher]
cmp cx,0
jnz .prep

.prep:
mov eax,[tss.eax]
mov ecx,[tss.ecx]
mov edx,[tss.edx]
mov ebx,[tss.ebx]
mov esp,[tss.esp]
mov ebp,[tss.ebp]
mov esi,[tss.esi]
mov edi,[tss.type2]

mov cx,[tss.ds]
mov es,cx
; mov cs,cx
mov ss,cx
mov ds,cx
mov fs,cx
mov gs,cx

mov cx,word [gate_voucher]
cmp cx,0
jnz .post

; call rmGate

.post:
ret

;%endif

;[bits 32]
task_prod16:
mov cx,word [gate_voucher]
cmp cx,0
jnz .inst_placer
; call protectedGate
.reference:
mov word [tss.ecx],cx
mov word [tss.esp1],sp
mov cx,ss
mov word [tss.ss1],cx

.inst_placer:
pop cx
push cx

mov word [tss.eip],cx
.prep:
; pushd ecx
pushf
; popf dword [tss.eflags]
pop cx
mov word [tss.eflags],cx


mov word [tss.eax],ax
; mov dword [tss.ecx],ecx
mov word [tss.edx],dx
mov word [tss.ebx],bx
mov word [tss.esp],sp
mov word [tss.ebp],bp
mov word [tss.esi],si
mov word [tss.type2],di

push cx
mov cx,[saved_segment]
mov word [tss.es],cx
mov word [tss.cs],cx
mov word [tss.ss],cx
mov word [tss.ds],cx
mov word [tss.fs],cx
mov word [tss.gs],cx

pop cx

ret
;r2pm_backtrack:

_prep_module3:
cld
cli

in al,0x92
or al,2
out 0x92,al

; mov [saved_segment],ds

lgdt[gdt32Ptr]
lidt[Idtpipe32]

mov word [gate_voucher],GATE_VOUCHER

protectedGate2:
mov eax,cr0
or eax,1
mov cr0,eax

jmp code32_post:modeProtect2

;.pm_track:
; mov word [gate_voucher],GATE_CHECK
; jmp protectedGate

[bits 32]
trail2:
jmp $
; mov sp,bp
; ret

modeProtect2:
mov word [saved_segment],data32_post

; call task_poke
mov ecx,[saved_segment]
mov es,cx
; mov cs,cx
mov ss,cx
mov ds,cx
mov fs,cx
mov gs,cx

.loopcheck:
jmp $

printstr_pm:
push ds
; push esi
push eax
; push ebp
mov ebp,0
jmp .gettext
.output:
mov esi,[vidmem_ptr]
add esi,ebp
mov [es:esi],ax
add ebp,2
pop esi
.gettext:
mov al,[ds:si]
add si,1
push esi
test al,al
jnz .output
.stub1:
dec esi
shr ebp,1
add esi,ebp
mov [vidmem_ptr],esi
pop esi
pop eax
pop ds
ret


task_poke:
mov cx,word [gate_voucher]
cmp cx,0
jnz .prep
.prep:
mov eax,[tss.eax]
mov ecx,[tss.ecx]
mov edx,[tss.edx]
mov ebx,[tss.ebx]
mov esp,[tss.esp]
mov ebp,[tss.ebp]
mov esi,[tss.esi]
mov edi,[tss.type2]

mov cx,[tss.ds]
mov es,cx
; mov cs,cx
mov ss,cx
mov ds,cx
mov fs,cx
mov gs,cx

mov cx,word [gate_voucher]
cmp cx,0
jnz .post

; call rmGate

.post:
ret


;%endif

;[bits 32]
task_prod:
mov cx,word [gate_voucher]
cmp cx,0
jnz .inst_placer


mov dword [tss.ecx],ecx
.inst_placer:
pop ecx
push ecx

mov dword [tss.eip],ecx
.prep:
; pushd ecx
pushf
; popf dword [tss.eflags]
pop ecx
mov dword [tss.eflags],ecx


mov dword [tss.eax],eax
; mov dword [tss.ecx],ecx
mov dword [tss.edx],edx
mov dword [tss.ebx],ebx
mov dword [tss.esp],esp
mov dword [tss.ebp],ebp
mov dword [tss.esi],esi
mov dword [tss.type2],edi

push ecx
mov cx,[saved_segment]
mov word [tss.es],cx
mov word [tss.cs],cx
mov word [tss.ss],cx
mov word [tss.ds],cx
mov word [tss.fs],cx
mov word [tss.gs],cx

pop ecx

ret

section .rodata
  __STACK__ dd 0x00FFFFFF
  __HEAP__ dd 0x00008C24

  VGA_MEM dw 0xA000
  WIDTH dw 320
  HEIGHT dw 219
 

VIDEO_TEXT_ADDR EQU 0XB8000
COLOR_ATTR_PSC EQU 0X6A
COLOR_ATTR_BSC EQU 0X5F
PM_MODE_STACK EQU 0X80000
VM_STACK_SEG EQU 0X0000
VM_STACK_OFS EQU 0X0000
VM_CS_SEG EQU 0X0000
FLAGS_VM_CMP EQU 17
FLAGS_CMP1 EQU 1
FLAGS_CMP_IF EQU 9
FLAGS_CMP_IOPL EQU 12
RING0_PROC_STACK_SIZE EQU 2048
;TSS_IO_MAP_SIZE EQU 0
TSS_IO_MAP_SIZE EQU 0x400/8
;VM_STACK_ADDRESS EQU vidmem_address
GATE_CHECK EQU 0
GATE_VOUCHER EQU 1


section .data

align 4
gdt32:
dq BUILD_GDT_DESC(0,0,0,0)
gdt32code:
dq BUILD_GDT_DESC(0x0000ffff,0,10011010b,1100b)
gdt32data:
dq BUILD_GDT_DESC(0x0000ffff,0,10010010b,1100b)
gdt16code:
dq BUILD_GDT_DESC(0x0000ffff,0,10011010b,1000b)
gdt16data:
dq BUILD_GDT_DESC(0x0000ffff,0,10010010b,1000b)

;gdt16tss:dq BUILD_GDT_DESC(0x8230,TSS_SIZE-1,10001001b,0000b) & 0 << 22
gdt32tss:
dq BUILD_GDT_DESC(TSS_SIZE-1,0x8470,10001001b,0000b)
; dq BUILD_GDT_DESC(TSS_SIZE-1,0x9260,10001001b,0000b)
.stub1:

code32_post: equ gdt32code -gdt32
data32_post: equ gdt32data -gdt32
;.stub:
code16_post: equ gdt16code -gdt32
data16_post: equ gdt16data -gdt32

tss32_post: equ gdt32tss -gdt32

gdt32Len: equ $-gdt32
gdt32Ptr: dw gdt32Len-1
dd gdt32


;align 4
vidmem_ptr: dd VIDEO_TEXT_ADDR
pm_str: db 'protected mode string ',0
pm_str_length: equ $-pm_str
vm_str: db 'virtual',0
;vidmem_address: dw 0

;resb 512
times 512 db 0

align 16
tss:
.back_link: dd 0
.esp0: dd 0
.ss0: dd 0
.esp1: dd 0
.ss1: dd 0
.esp2: dd 0
.ss2: dd 0
.cr3: dd 0
.eip: dd 0
.eflags: dd 0
.eax: dd 0
.ecx: dd 0
.edx: dd 0
.ebx: dd 0
.esp: dd 0
.ebp: dd 0
.esi: dd 0
.type2: dd 0
.es: dd 0
.cs: dd 0
.ss: dd 0
.ds: dd 0
.fs: dd 0
.gs: dd 0
.ldt: dd 0
.trap: dw 0
.iomap_base: dw 0

.iomap: TIMES TSS_IO_MAP_SIZE db 0x00

%if TSS_IO_MAP_SIZE > 0
.iomap_pad:db 0xff

%endif
.end:
TSS_SIZE: EQU tss.end -tss

save_cr0 dd 0
saved_segment resd 0
gate_voucher dw 1

X dd 0
Y   dd 0
color_pkg dw 5


I had a retstack method to try and answer this but I get more of the same yields. That would check both methods so hopefully this post could get a response too.