Author Topic: What's wrong with this code?  (Read 8234 times)

Offline ben321

  • Full Member
  • **
  • Posts: 182
What's wrong with this code?
« on: January 29, 2019, 06:22:39 AM »
I need help debugging this. I compiled to raw binary, and saved it as a .COM file for use in DOS. I tested it in DosBox debugger version. It keeps failing, no matter what I do to try to fix it. The
Ideally, it will get to the "start32" label, and be running in 32bit protected mode at this point, and then it will stay in that loop. Not too impressive, but it will be a HUGE first step in getting to understand 32bit protected mode. It would be a great help if you could tell me what I'm doing wrong here, because it's not working so far. Note that the 0x1FE0 being the real mode segment address I mention, is due to the fact that when a COM file loads it loads at real-mode segment number 0x1FE (which is multiplied by 16 to get the address of that segment).

Code: [Select]
org 0x100


USE16 ;use 16bit code
start:
cli ;disable interrupts
lgdt [GDTP] ;Point the LGDT instruction to the location in memory where the pointer (GDTP, aka GDT Pointer) exists that points to the GDT

;perform procedure to put the the CPU into Protected Mode
mov eax,cr0
or al,1
mov cr0,eax

jmp 1:(start32+0x1FE0) ;In the memory space in protected mode, the function start32 starts at its label's address + the segment address (which is 16 times the real-mode segment value of 0x1FE)









USE32 ;32bit code starts here
start32:
jmp start32







GDTP: ;GDT pointer
dw 24 ;GDT is 24 bytes in size
dd GDT+0x1FE0 ;In absolute memory space, the GDT starts at its label's address + the segment address (which is 16 times the segment value of 0x1FE)



;The following GDT sets the code and data segments to be the same, and makes it start at the base address of 0 and occupy all 4GB possible for 32bit memory

GDT: ;Copied GDT from from Wikipedia
; offset 0x0
; null descriptor:
dq 0

; offset 0x8
; code: ; cs should point to this descriptor
dw 0xffff ; segment limit first 0-15 bits
dw 0 ; base first 0-15 bits
db 0 ; base 16-23 bits
db 0x9a ; access byte
db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
db 0 ; base 24-31 bits

; offset 0x10
; data: ; ds, ss, es, fs, and gs should point to this descriptor
dw 0xffff ; segment limit first 0-15 bits
dw 0 ; base first 0-15 bits
db 0 ; base 16-23 bits
db 0x92 ; access byte
db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
db 0 ; base 24-31 bits

« Last Edit: January 29, 2019, 06:28:35 AM by ben321 »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: What's wrong with this code?
« Reply #1 on: January 29, 2019, 07:57:33 PM »
Hi ben321,

In PM, the segment registers are no longer multiplied by 16, but are indexes into the descriptor table.

Try this: Warning: untested code!

Code: [Select]
org 0x100


USE16 ;use 16bit code
start:
cli ;disable interrupts
lgdt [GDTP] ;Point the LGDT instruction to the location in memory where the pointer (GDTP, aka GDT Pointer) exists that points to the GDT

;perform procedure to put the the CPU into Protected Mode
mov eax,cr0
or al,1
mov cr0,eax

jmp 8:start32
 ;In the memory space in protected mode, the function start32 starts at its label's address + the segment address (which is 16 times the real-mode segment value of 0x1FE)

; not once you've set bit 0 of cr0 to 1! Then. what is in the segment registers is a "selector" - an index into the descriptor table.









USE32 ;32bit code starts here
start32:
jmp start32







GDTP: ;GDT pointer
dw 24 ;GDT is 24 bytes in size
dd GDT+0x1FE0 ;In absolute memory space, the GDT starts at its label's address + the segment address (which is 16 times the segment value of 0x1FE)



;The following GDT sets the code and data segments to be the same, and makes it start at the base address of 0 and occupy all 4GB possible for 32bit memory

GDT: ;Copied GDT from from Wikipedia
; offset 0x0
; null descriptor:
dq 0

; offset 0x8
; code: ; cs should point to this descriptor
dw 0xffff ; segment limit first 0-15 bits
dw 0 ; base first 0-15 bits
db 0 ; base 16-23 bits
db 0x9a ; access byte
db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
db 0 ; base 24-31 bits

; offset 0x10
; data: ; ds, ss, es, fs, and gs should point to this descriptor
dw 0xffff ; segment limit first 0-15 bits
dw 0 ; base first 0-15 bits
db 0 ; base 16-23 bits
db 0x92 ; access byte
db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
db 0 ; base 24-31 bits

Good luck!

Best,
Frank




Offline ben321

  • Full Member
  • **
  • Posts: 182
Re: What's wrong with this code?
« Reply #2 on: January 30, 2019, 05:59:36 AM »
Hi ben321,

In PM, the segment registers are no longer multiplied by 16, but are indexes into the descriptor table.

Try this: Warning: untested code!

Code: [Select]
org 0x100


USE16 ;use 16bit code
start:
cli ;disable interrupts
lgdt [GDTP] ;Point the LGDT instruction to the location in memory where the pointer (GDTP, aka GDT Pointer) exists that points to the GDT

;perform procedure to put the the CPU into Protected Mode
mov eax,cr0
or al,1
mov cr0,eax

jmp 8:start32
 ;In the memory space in protected mode, the function start32 starts at its label's address + the segment address (which is 16 times the real-mode segment value of 0x1FE)

; not once you've set bit 0 of cr0 to 1! Then. what is in the segment registers is a "selector" - an index into the descriptor table.









USE32 ;32bit code starts here
start32:
jmp start32







GDTP: ;GDT pointer
dw 24 ;GDT is 24 bytes in size
dd GDT+0x1FE0 ;In absolute memory space, the GDT starts at its label's address + the segment address (which is 16 times the segment value of 0x1FE)



;The following GDT sets the code and data segments to be the same, and makes it start at the base address of 0 and occupy all 4GB possible for 32bit memory

GDT: ;Copied GDT from from Wikipedia
; offset 0x0
; null descriptor:
dq 0

; offset 0x8
; code: ; cs should point to this descriptor
dw 0xffff ; segment limit first 0-15 bits
dw 0 ; base first 0-15 bits
db 0 ; base 16-23 bits
db 0x9a ; access byte
db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
db 0 ; base 24-31 bits

; offset 0x10
; data: ; ds, ss, es, fs, and gs should point to this descriptor
dw 0xffff ; segment limit first 0-15 bits
dw 0 ; base first 0-15 bits
db 0 ; base 16-23 bits
db 0x92 ; access byte
db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
db 0 ; base 24-31 bits

Good luck!

Best,
Frank

I haven't run your code yet, but by looking at it, I can tell it has a problem. You changed my "jmp 1:start32" to "jmp 8:start32". But that is a problem. The index means the "entry number" of the GDT, not the "byte number" from the start of the GDT. Your jmp 8:start32 means "get entry number 8 in the GDT". Yes, the first entry starts at StartOfGDT+8 bytes, but it is entry number 1 (entry number 0 is always blank). So even without testing your code, I can tell i won't work. The CPU automatically multiplies the index number by 8 internally, so you don't need to calculate the byte number for that GDT entry.

At least that's how I believe it works.

Offline ben321

  • Full Member
  • **
  • Posts: 182
Re: What's wrong with this code?
« Reply #3 on: January 30, 2019, 06:22:06 AM »
Ok, it turns out we were both part right. You were right in that the index represented by the segment registers was a BYTE offset into the GDT (not the entry number in the GDT).
I was correct in assuming (when doing the far jump) that there needed to be an offset added to the label. The offset is the program's starting section number (as used in real mode) times 16. The reason for that is that when in DOS running a COM file, there is a bunch of stuff in the memory space before the COM file (Everything before section 0x1FE), and that stuff is DOS itself. Unless you want to far jump into DOS's own code (and crash the OS), then you NEED that offset added to the label "start32". The final version of the far jump that I have tested (and I know it works) is this:
Code: [Select]
jmp 8:(start32+0x1FE0)
So my complete working code is now this:
Code: [Select]
org 0x100


USE16 ;use 16bit code
start:
cli ;disable interrupts
lgdt [GDTP] ;Point the LGDT instruction to the location in memory where the pointer (GDTP, aka GDT Pointer) exists that points to the GDT

;perform procedure to put the the CPU into Protected Mode
mov eax,cr0
or al,1
mov cr0,eax

jmp 8:(start32+0x1FE0)
 ;In the memory space in protected mode, the function start32 starts at its label's address + the segment address (which is 16 times the real-mode segment value of 0x1FE)

; not once you've set bit 0 of cr0 to 1! Then. what is in the segment registers is a "selector" - an index into the descriptor table.









USE32 ;32bit code starts here
start32:
nop
nop
nop
jmp start32









GDTP: ;GDT pointer
dw 24 ;GDT is 24 bytes in size
dd GDT+0x1FE0 ;In absolute memory space, the GDT starts at its label's address + the segment address (which is 16 times the segment value of 0x1FE)



;The following GDT sets the code and data segments to be the same, and makes it start at the base address of 0 and occupy all 4GB possible for 32bit memory

GDT: ;Copied GDT from from Wikipedia
; offset 0x0
; null descriptor:
dq 0

; offset 0x8
; code: ; cs should point to this descriptor
dw 0xffff ; segment limit first 0-15 bits
dw 0 ; base first 0-15 bits
db 0 ; base 16-23 bits
db 0x9a ; access byte
db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
db 0 ; base 24-31 bits

; offset 0x10
; data: ; ds, ss, es, fs, and gs should point to this descriptor
dw 0xffff ; segment limit first 0-15 bits
dw 0 ; base first 0-15 bits
db 0 ; base 16-23 bits
db 0x92 ; access byte
db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
db 0 ; base 24-31 bits


Now the next step is to figure out the easiest (least lines of code) way to enable the A20 line, for COMPLETE access to the full 32bit memory space.

Note that in all the online examples I've seen for protected mode, they always write code with the assumption that one will back out of protected mode at some point, and go into real mode again. The result is a huge amount of code written to save the state of real mode, and prepare the CPU for the ability to reenter real mode. This is not my intent, and I consider that extra code nothing but junk code. The problem is what parts of it are junk, and what parts are required? My intent is to write a 32bit application that initiates protected mode, jumps into protected mode, and then runs in protected mode until the system is reset (reboot by poweroff on real hardware, or DosBox is closed and restarted when using DosBox). This should HUGELY simplify the code, but there's one problem. I can't find even ONE piece of sample code online where the person is entering protected mode with the intent to not leave protected mode. So I have NOTHING to look at to guide my programming efforts. I hope somebody here will be able to help me.
« Last Edit: January 30, 2019, 06:30:51 AM by ben321 »

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: What's wrong with this code?
« Reply #4 on: January 30, 2019, 07:15:42 PM »
Different from real mode segment selectors, in i386 protected mode they hava a structure. The first 2 bits represent the requested privilege level (RPL) and bit 3 says if GDT or LDT will be used. So, the index must be shifts 3 bits to the left.

In "JMP 8:_start32" the segment part specifies index 1, but also RPL=0 and GDT (0).

Offline ben321

  • Full Member
  • **
  • Posts: 182
Re: What's wrong with this code?
« Reply #5 on: January 30, 2019, 07:58:26 PM »
Different from real mode segment selectors, in i386 protected mode they hava a structure. The first 2 bits represent the requested privilege level (RPL) and bit 3 says if GDT or LDT will be used. So, the index must be shifts 3 bits to the left.

In "JMP 8:_start32" the segment part specifies index 1, but also RPL=0 and GDT (0).
Bit 3 says if GDT or LDT is used? Don't you mean bit 2? Bits are labeled from bit number 0 (least significant bit), so the 3rd bit is called bit 2.

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: What's wrong with this code?
« Reply #6 on: February 01, 2019, 10:57:01 AM »
Bit 3 says if GDT or LDT is used? Don't you mean bit 2? Bits are labeled from bit number 0 (least significant bit), so the 3rd bit is called bit 2.

Yep... sorry... bit 2... I mean 3rd bit...

Offline debs3759

  • Global Moderator
  • Full Member
  • *****
  • Posts: 221
  • Country: gb
    • GPUZoo
Re: What's wrong with this code?
« Reply #7 on: February 02, 2019, 12:12:15 AM »
Now the next step is to figure out the easiest (least lines of code) way to enable the A20 line, for COMPLETE access to the full 32bit memory space.

My code to test and enable A20 is:

Code: [Select]
%define KBC_Control 060h ; KBC Control Port
%define KBC_Status 064h ; KBC Status Port

%define KBC_in_buf_full 002h ; Input buffer full
%define KBC_enable_A20 0DFh ; enable A20 command


;-------------------------------------------------------------------------------+
; EnableA20: +
; +
; input: none +
; +
; output: CF A20 is set +
; NC A20 is not set +
; +
; This routine first tests whether A20 is set. If not, it tries to set it using +
; Port 92 (Fast A20). If that fails, it then writes to the keyboard +
; controller to set it. +
; +
; If A20 is enabled the routine returns with CF set, else CF is cleared. +
;-------------------------------------------------------------------------------+

section .text

EnableA20:
push ax
push dx

call TestA20 ; is A20 set?
jc .1 ; yes, exit

in al,092h ; read Port 92h
or al,02h ; Set A20 bit
out 092h,al ; write it - set A20

call TestA20 ; Test A20 again...
jc .1 ; if set, exit

mov ah,KBC_enable_A20
call KBC_GateA20

call TestA20 ; Test A20 again...

.1:
pop ax
pop dx

ret

;-------------------------------------------------------------------------------+
; TestA20: +
; +
; input: none +
; +
; output: CF A20 is set +
; NC A20 is not set +
; +
; This routine test whether A20 is set or not. +
;-------------------------------------------------------------------------------+

section .text

TestA20:
push ds
push es

push ax ; used a few times...
push bx ; likewise....
push si
push di

xor si,si
xor di,di
mov ds,si ; DS:SI points to Int 0 vector

mov ax,0FFFFh
mov es,ax
mov di,010h ; ES:DI points to:
; 0x0000  if A20 not set
; 0x10000 if A20 is set
cli

push word [DS:0] ; save Int 0 vector

lodsw ; ax contains int 0 vector
mov bx,ax ; save in bx
dec ax ; decrement ax
stosw ; and save to ES:DI

xor si,si
xor di,di
lodsw ; read DS:SI again

cmp ax,bx ; if A20 is set, ZF is set

pop word [DS:0] ; restore Int 0 vector

sti

clc
jnz .1 ; if ZF
stc ; then set CF
.1:

pop di
pop si
pop bx
pop ax

pop es
pop ds

ret

;-------------------------------------------------------------------------------+
; KBC_Wait: +
; +
; input: none +
; +
; output: NC = buffer is empty +
; CF = buffer is not empty +
;-------------------------------------------------------------------------------+

section .text

KBC_Wait:
push ax ; save ax
push bx ; save bx
push cx ; save cx
mov bx,5 ; set super-long timeout
xor cx,cx ; cx=0:  timeout value
.1:
out 0edh,ax ; I/O delay
in al,KBC_Status ; read 8042 status port
and al,KBC_in_buf_full ; input buffer full flag (D1)
loopnz .1 ; loop until input buffer empty
;   or timeout
jz .2 ; success!
dec bx ; are we done yet?
jnz .1 ; keep trying
or al,al ; success on last try?
.2:
clc
jz .3
stc
.3:
pop cx ; restore cx
pop bx ; restore bx
pop ax ; restore ax
ret

;-------------------------------------------------------------------------------+
; KBC_GateA20: +
; +
; input: ah = command to send to KBC +
; +
; output: NC = Command succeeded +
; CF = Command failed +
;-------------------------------------------------------------------------------+

section .text

KBC_GateA20:
pushf ; save interrupt status
cli ; disable ints while using 8042

Call KBC_Wait ; insure 8042 input buffer empty
jc A20_Fail ; ret: 8042 unable to accept cmd

out 0edh,ax ; I/O delay
mov al,0D1h ; 8042 cmd to write output port
out KBC_Status,al ; output cmd to 8042

Call KBC_Wait ; wait for 8042 to accept cmd
jc A20_Fail ; ret: 8042 unable to accept cmd

mov al,ah ; 8042 port data
out KBC_Control,al ; output port data to 8042

Call KBC_Wait ; wait for 8042 to port data
jc A20_Fail

push cx ; save CX
mov cx,14h ;
.DLY:
out 0edh,ax ; Wait for KBC to execute the
loop .DLY ;  command.  (about 25uS)
pop cx ; restore CX

clc

A20_Fail:
popf ; restore flags
ret

It's not compact, as it contains lots of checks, but if you need to remove all the checks to make it more compact, that should be easy enough. The I/O delays are needed on some systems, as without it your code may be too fast for the KBC controller.
My graphics card database: www.gpuzoo.com

Offline ben321

  • Full Member
  • **
  • Posts: 182
Re: What's wrong with this code?
« Reply #8 on: February 02, 2019, 04:43:59 AM »
Now the next step is to figure out the easiest (least lines of code) way to enable the A20 line, for COMPLETE access to the full 32bit memory space.

My code to test and enable A20 is:

Code: [Select]
%define KBC_Control 060h ; KBC Control Port
%define KBC_Status 064h ; KBC Status Port

%define KBC_in_buf_full 002h ; Input buffer full
%define KBC_enable_A20 0DFh ; enable A20 command


;-------------------------------------------------------------------------------+
; EnableA20: +
; +
; input: none +
; +
; output: CF A20 is set +
; NC A20 is not set +
; +
; This routine first tests whether A20 is set. If not, it tries to set it using +
; Port 92 (Fast A20). If that fails, it then writes to the keyboard +
; controller to set it. +
; +
; If A20 is enabled the routine returns with CF set, else CF is cleared. +
;-------------------------------------------------------------------------------+

section .text

EnableA20:
push ax
push dx

call TestA20 ; is A20 set?
jc .1 ; yes, exit

in al,092h ; read Port 92h
or al,02h ; Set A20 bit
out 092h,al ; write it - set A20

call TestA20 ; Test A20 again...
jc .1 ; if set, exit

mov ah,KBC_enable_A20
call KBC_GateA20

call TestA20 ; Test A20 again...

.1:
pop ax
pop dx

ret

;-------------------------------------------------------------------------------+
; TestA20: +
; +
; input: none +
; +
; output: CF A20 is set +
; NC A20 is not set +
; +
; This routine test whether A20 is set or not. +
;-------------------------------------------------------------------------------+

section .text

TestA20:
push ds
push es

push ax ; used a few times...
push bx ; likewise....
push si
push di

xor si,si
xor di,di
mov ds,si ; DS:SI points to Int 0 vector

mov ax,0FFFFh
mov es,ax
mov di,010h ; ES:DI points to:
; 0x0000  if A20 not set
; 0x10000 if A20 is set
cli

push word [DS:0] ; save Int 0 vector

lodsw ; ax contains int 0 vector
mov bx,ax ; save in bx
dec ax ; decrement ax
stosw ; and save to ES:DI

xor si,si
xor di,di
lodsw ; read DS:SI again

cmp ax,bx ; if A20 is set, ZF is set

pop word [DS:0] ; restore Int 0 vector

sti

clc
jnz .1 ; if ZF
stc ; then set CF
.1:

pop di
pop si
pop bx
pop ax

pop es
pop ds

ret

;-------------------------------------------------------------------------------+
; KBC_Wait: +
; +
; input: none +
; +
; output: NC = buffer is empty +
; CF = buffer is not empty +
;-------------------------------------------------------------------------------+

section .text

KBC_Wait:
push ax ; save ax
push bx ; save bx
push cx ; save cx
mov bx,5 ; set super-long timeout
xor cx,cx ; cx=0:  timeout value
.1:
out 0edh,ax ; I/O delay
in al,KBC_Status ; read 8042 status port
and al,KBC_in_buf_full ; input buffer full flag (D1)
loopnz .1 ; loop until input buffer empty
;   or timeout
jz .2 ; success!
dec bx ; are we done yet?
jnz .1 ; keep trying
or al,al ; success on last try?
.2:
clc
jz .3
stc
.3:
pop cx ; restore cx
pop bx ; restore bx
pop ax ; restore ax
ret

;-------------------------------------------------------------------------------+
; KBC_GateA20: +
; +
; input: ah = command to send to KBC +
; +
; output: NC = Command succeeded +
; CF = Command failed +
;-------------------------------------------------------------------------------+

section .text

KBC_GateA20:
pushf ; save interrupt status
cli ; disable ints while using 8042

Call KBC_Wait ; insure 8042 input buffer empty
jc A20_Fail ; ret: 8042 unable to accept cmd

out 0edh,ax ; I/O delay
mov al,0D1h ; 8042 cmd to write output port
out KBC_Status,al ; output cmd to 8042

Call KBC_Wait ; wait for 8042 to accept cmd
jc A20_Fail ; ret: 8042 unable to accept cmd

mov al,ah ; 8042 port data
out KBC_Control,al ; output port data to 8042

Call KBC_Wait ; wait for 8042 to port data
jc A20_Fail

push cx ; save CX
mov cx,14h ;
.DLY:
out 0edh,ax ; Wait for KBC to execute the
loop .DLY ;  command.  (about 25uS)
pop cx ; restore CX

clc

A20_Fail:
popf ; restore flags
ret

It's not compact, as it contains lots of checks, but if you need to remove all the checks to make it more compact, that should be easy enough. The I/O delays are needed on some systems, as without it your code may be too fast for the KBC controller.

Is there a way to do it without the keyboard controller? Given that Protected 32bit mode is an intentional feature in the CPU, you'd think that Intel would have given a more direct route to activate this mode, than require you to do a hacky thing with the KBC. I mean, I'd think they'd make it so activating 32bit protected mode automatically activated the A20 gate, since there is no circumstance in which you would need to run 32bit protected mode with A20 gate disabled (the whole A20 gate thing is only for 16bit legacy modes anyway). In 32bit protected mode, you will always want to activate the A20 gate.

And even if the A20 gate requires a separate action to activate it, you'd think that this Intel would create an opcode directly in the CPU for this VERY IMPORTANT operation, so one line of assembly code could activate the A20 gate. Or at the very least you'd think it would be something that could be activated via the correct output port using an OUT instruction, or via the correct BIOS interrupt with the INT instruction.

Offline debs3759

  • Global Moderator
  • Full Member
  • *****
  • Posts: 221
  • Country: gb
    • GPUZoo
Re: What's wrong with this code?
« Reply #9 on: February 02, 2019, 05:56:06 PM »
I think port 92 is independent from the KBC. It's certainly much faster than the older method, but not supported on all older systems. This link will tell you more: https://www.win.tue.nl/~aeb/linux/kbd/A20.html
My graphics card database: www.gpuzoo.com