Author Topic: What's wrong with this code? (Read 38779 times)

ben321 · « **on:** January 29, 2019, 06:22:39 AM »

I need help debugging this. I compiled to raw binary, and saved it as a .COM file for use in DOS. I tested it in DosBox debugger version. It keeps failing, no matter what I do to try to fix it. The
Ideally, it will get to the "start32" label, and be running in 32bit protected mode at this point, and then it will stay in that loop. Not too impressive, but it will be a HUGE first step in getting to understand 32bit protected mode. It would be a great help if you could tell me what I'm doing wrong here, because it's not working so far. Note that the 0x1FE0 being the real mode segment address I mention, is due to the fact that when a COM file loads it loads at real-mode segment number 0x1FE (which is multiplied by 16 to get the address of that segment).

Code: [Select]

org 0x100


USE16 ;use 16bit code
start:
	cli ;disable interrupts
	lgdt [GDTP] ;Point the LGDT instruction to the location in memory where the pointer (GDTP, aka GDT Pointer) exists that points to the GDT

	;perform procedure to put the the CPU into Protected Mode
	mov eax,cr0
	or al,1
	mov cr0,eax

	jmp 1:(start32+0x1FE0) ;In the memory space in protected mode, the function start32 starts at its label's address + the segment address (which is 16 times the real-mode segment value of 0x1FE)
			








USE32 ;32bit code starts here
start32:
	jmp start32







GDTP: ;GDT pointer
	dw 24 ;GDT is 24 bytes in size
	dd GDT+0x1FE0 ;In absolute memory space, the GDT starts at its label's address + the segment address (which is 16 times the segment value of 0x1FE)



;The following GDT sets the code and data segments to be the same, and makes it start at the base address of 0 and occupy all 4GB possible for 32bit memory

GDT: ;Copied GDT from from Wikipedia
; offset 0x0
; null descriptor:
	dq 0

; offset 0x8
; code:				; cs should point to this descriptor
	dw 0xffff		; segment limit first 0-15 bits
	dw 0			; base first 0-15 bits
	db 0			; base 16-23 bits
	db 0x9a			; access byte
	db 11001111b		; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
	db 0			; base 24-31 bits

; offset 0x10
; data:				; ds, ss, es, fs, and gs should point to this descriptor
	dw 0xffff		; segment limit first 0-15 bits
	dw 0			; base first 0-15 bits
	db 0			; base 16-23 bits
	db 0x92			; access byte
	db 11001111b	; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
	db 0			; base 24-31 bits

Frank Kotler · « **Reply #1 on:** January 29, 2019, 07:57:33 PM »

Hi ben321,

In PM, the segment registers are no longer multiplied by 16, but are indexes into the descriptor table.

Try this: Warning: untested code!

Code: [Select]

org 0x100


USE16 ;use 16bit code
start:
	cli ;disable interrupts
	lgdt [GDTP] ;Point the LGDT instruction to the location in memory where the pointer (GDTP, aka GDT Pointer) exists that points to the GDT

	;perform procedure to put the the CPU into Protected Mode
	mov eax,cr0
	or al,1
	mov cr0,eax

	jmp 8:start32
 ;In the memory space in protected mode, the function start32 starts at its label's address + the segment address (which is 16 times the real-mode segment value of 0x1FE)
			
; not once you've set bit 0 of cr0 to 1! Then. what is in the segment registers is a "selector" - an index into the descriptor table.









USE32 ;32bit code starts here
start32:
	jmp start32







GDTP: ;GDT pointer
	dw 24 ;GDT is 24 bytes in size
	dd GDT+0x1FE0 ;In absolute memory space, the GDT starts at its label's address + the segment address (which is 16 times the segment value of 0x1FE)



;The following GDT sets the code and data segments to be the same, and makes it start at the base address of 0 and occupy all 4GB possible for 32bit memory

GDT: ;Copied GDT from from Wikipedia
; offset 0x0
; null descriptor:
	dq 0

; offset 0x8
; code:				; cs should point to this descriptor
	dw 0xffff		; segment limit first 0-15 bits
	dw 0			; base first 0-15 bits
	db 0			; base 16-23 bits
	db 0x9a			; access byte
	db 11001111b		; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
	db 0			; base 24-31 bits

; offset 0x10
; data:				; ds, ss, es, fs, and gs should point to this descriptor
	dw 0xffff		; segment limit first 0-15 bits
	dw 0			; base first 0-15 bits
	db 0			; base 16-23 bits
	db 0x92			; access byte
	db 11001111b	; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
	db 0			; base 24-31 bits

Good luck!

Best,
Frank

ben321 · « **Reply #2 on:** January 30, 2019, 05:59:36 AM »

Quote from: Frank Kotler on January 29, 2019, 07:57:33 PM

Hi ben321,

In PM, the segment registers are no longer multiplied by 16, but are indexes into the descriptor table.

Try this: Warning: untested code!

Code: [Select]
org 0x100 USE16 ;use 16bit code start: cli ;disable interrupts lgdt [GDTP] ;Point the LGDT instruction to the location in memory where the pointer (GDTP, aka GDT Pointer) exists that points to the GDT ;perform procedure to put the the CPU into Protected Mode mov eax,cr0 or al,1 mov cr0,eax jmp 8:start32 ;In the memory space in protected mode, the function start32 starts at its label's address + the segment address (which is 16 times the real-mode segment value of 0x1FE) ; not once you've set bit 0 of cr0 to 1! Then. what is in the segment registers is a "selector" - an index into the descriptor table. USE32 ;32bit code starts here start32: jmp start32 GDTP: ;GDT pointer dw 24 ;GDT is 24 bytes in size dd GDT+0x1FE0 ;In absolute memory space, the GDT starts at its label's address + the segment address (which is 16 times the segment value of 0x1FE) ;The following GDT sets the code and data segments to be the same, and makes it start at the base address of 0 and occupy all 4GB possible for 32bit memory GDT: ;Copied GDT from from Wikipedia ; offset 0x0 ; null descriptor: dq 0 ; offset 0x8 ; code: ; cs should point to this descriptor dw 0xffff ; segment limit first 0-15 bits dw 0 ; base first 0-15 bits db 0 ; base 16-23 bits db 0x9a ; access byte db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide) db 0 ; base 24-31 bits ; offset 0x10 ; data: ; ds, ss, es, fs, and gs should point to this descriptor dw 0xffff ; segment limit first 0-15 bits dw 0 ; base first 0-15 bits db 0 ; base 16-23 bits db 0x92 ; access byte db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide) db 0 ; base 24-31 bits
Good luck!

Best,
Frank

I haven't run your code yet, but by looking at it, I can tell it has a problem. You changed my "jmp 1:start32" to "jmp 8:start32". But that is a problem. The index means the "entry number" of the GDT, not the "byte number" from the start of the GDT. Your jmp 8:start32 means "get entry number 8 in the GDT". Yes, the first entry starts at StartOfGDT+8 bytes, but it is entry number 1 (entry number 0 is always blank). So even without testing your code, I can tell i won't work. The CPU automatically multiplies the index number by 8 internally, so you don't need to calculate the byte number for that GDT entry.

At least that's how I believe it works.

ben321 · « **Reply #3 on:** January 30, 2019, 06:22:06 AM »

Ok, it turns out we were both part right. You were right in that the index represented by the segment registers was a BYTE offset into the GDT (not the entry number in the GDT).
I was correct in assuming (when doing the far jump) that there needed to be an offset added to the label. The offset is the program's starting section number (as used in real mode) times 16. The reason for that is that when in DOS running a COM file, there is a bunch of stuff in the memory space before the COM file (Everything before section 0x1FE), and that stuff is DOS itself. Unless you want to far jump into DOS's own code (and crash the OS), then you NEED that offset added to the label "start32". The final version of the far jump that I have tested (and I know it works) is this:

Code: [Select]

jmp 8:(start32+0x1FE0)
So my complete working code is now this:

Code: [Select]

org 0x100


USE16 ;use 16bit code
start:
	cli ;disable interrupts
	lgdt [GDTP] ;Point the LGDT instruction to the location in memory where the pointer (GDTP, aka GDT Pointer) exists that points to the GDT

	;perform procedure to put the the CPU into Protected Mode
	mov eax,cr0
	or al,1
	mov cr0,eax

	jmp 8:(start32+0x1FE0)
 ;In the memory space in protected mode, the function start32 starts at its label's address + the segment address (which is 16 times the real-mode segment value of 0x1FE)
			
; not once you've set bit 0 of cr0 to 1! Then. what is in the segment registers is a "selector" - an index into the descriptor table.









USE32 ;32bit code starts here
start32:
	nop
	nop
	nop
	jmp start32




	
	



GDTP: ;GDT pointer
	dw 24 ;GDT is 24 bytes in size
	dd GDT+0x1FE0 ;In absolute memory space, the GDT starts at its label's address + the segment address (which is 16 times the segment value of 0x1FE)



;The following GDT sets the code and data segments to be the same, and makes it start at the base address of 0 and occupy all 4GB possible for 32bit memory

GDT: ;Copied GDT from from Wikipedia
; offset 0x0
; null descriptor:
	dq 0

; offset 0x8
; code:				; cs should point to this descriptor
	dw 0xffff		; segment limit first 0-15 bits
	dw 0			; base first 0-15 bits
	db 0			; base 16-23 bits
	db 0x9a			; access byte
	db 11001111b		; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
	db 0			; base 24-31 bits

; offset 0x10
; data:				; ds, ss, es, fs, and gs should point to this descriptor
	dw 0xffff		; segment limit first 0-15 bits
	dw 0			; base first 0-15 bits
	db 0			; base 16-23 bits
	db 0x92			; access byte
	db 11001111b	; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
	db 0			; base 24-31 bits

Now the next step is to figure out the easiest (least lines of code) way to enable the A20 line, for COMPLETE access to the full 32bit memory space.

Note that in all the online examples I've seen for protected mode, they always write code with the assumption that one will back out of protected mode at some point, and go into real mode again. The result is a huge amount of code written to save the state of real mode, and prepare the CPU for the ability to reenter real mode. This is not my intent, and I consider that extra code nothing but junk code. The problem is what parts of it are junk, and what parts are required? My intent is to write a 32bit application that initiates protected mode, jumps into protected mode, and then runs in protected mode until the system is reset (reboot by poweroff on real hardware, or DosBox is closed and restarted when using DosBox). This should HUGELY simplify the code, but there's one problem. I can't find even ONE piece of sample code online where the person is entering protected mode with the intent to not leave protected mode. So I have NOTHING to look at to guide my programming efforts. I hope somebody here will be able to help me.

fredericopissarra · « **Reply #4 on:** January 30, 2019, 07:15:42 PM »

Different from real mode segment selectors, in i386 protected mode they hava a structure. The first 2 bits represent the requested privilege level (RPL) and bit 3 says if GDT or LDT will be used. So, the index must be shifts 3 bits to the left.

In "JMP 8:_start32" the segment part specifies index 1, but also RPL=0 and GDT (0).

ben321 · « **Reply #5 on:** January 30, 2019, 07:58:26 PM »

Quote from: fredericopissarra on January 30, 2019, 07:15:42 PM

Different from real mode segment selectors, in i386 protected mode they hava a structure. The first 2 bits represent the requested privilege level (RPL) and bit 3 says if GDT or LDT will be used. So, the index must be shifts 3 bits to the left.

In "JMP 8:_start32" the segment part specifies index 1, but also RPL=0 and GDT (0).

Bit 3 says if GDT or LDT is used? Don't you mean bit 2? Bits are labeled from bit number 0 (least significant bit), so the 3rd bit is called bit 2.

fredericopissarra · « **Reply #6 on:** February 01, 2019, 10:57:01 AM »

Quote from: ben321 on January 30, 2019, 07:58:26 PM

Bit 3 says if GDT or LDT is used? Don't you mean bit 2? Bits are labeled from bit number 0 (least significant bit), so the 3rd bit is called bit 2.

Yep... sorry... bit 2... I mean 3rd bit...

debs3759 · « **Reply #7 on:** February 02, 2019, 12:12:15 AM »

Quote from: ben321 on January 30, 2019, 06:22:06 AM

Now the next step is to figure out the easiest (least lines of code) way to enable the A20 line, for COMPLETE access to the full 32bit memory space.

My code to test and enable A20 is:

Code: [Select]

%define	KBC_Control		060h	; KBC Control Port
%define	KBC_Status		064h	; KBC Status Port

%define	KBC_in_buf_full		002h	; Input buffer full
%define	KBC_enable_A20		0DFh	; enable A20 command


;-------------------------------------------------------------------------------+
; EnableA20:									+
;										+
;	input:		none							+
;										+
;	output:		CF		A20 is set				+
;			NC		A20 is not set				+
;										+
; This routine first tests whether A20 is set. If not, it tries to set it using	+
;	Port 92 (Fast A20). If that fails, it then writes to the keyboard	+
;	controller to set it.							+
;										+
; If A20 is enabled the routine returns with CF set, else CF is cleared.	+
;-------------------------------------------------------------------------------+

section .text

EnableA20:
	push	ax
	push	dx

	call	TestA20			; is A20 set?
	jc	.1			; yes, exit

	in	al,092h			; read Port 92h
	or	al,02h			; Set A20 bit
	out	092h,al			; write it - set A20

	call	TestA20			; Test A20 again...
	jc	.1			; if set, exit

	mov	ah,KBC_enable_A20
	call	KBC_GateA20

	call	TestA20			; Test A20 again...

.1:
	pop	ax
	pop	dx

	ret

;-------------------------------------------------------------------------------+
; TestA20:									+
;										+
;	input:		none							+
;										+
;	output:		CF		A20 is set				+
;			NC		A20 is not set				+
;										+
; This routine test whether A20 is set or not.					+
;-------------------------------------------------------------------------------+

section .text

TestA20:
	push	ds
	push	es

	push	ax			; used a few times...
	push	bx			; likewise....
	push	si
	push	di

	xor	si,si
	xor	di,di
	mov	ds,si			; DS:SI points to Int 0 vector

	mov	ax,0FFFFh
	mov	es,ax
	mov	di,010h			; ES:DI points to:
					;	0x0000  if A20 not set
					;	0x10000 if A20 is set
	cli

	push	word [DS:0]		; save Int 0 vector

	lodsw				; ax contains int 0 vector
	mov	bx,ax			; save in bx
	dec	ax			; decrement ax
	stosw				; and save to ES:DI

	xor	si,si
	xor	di,di
	lodsw				; read DS:SI again

	cmp	ax,bx			; if A20 is set, ZF is set

	pop	word [DS:0]		; restore Int 0 vector

	sti

	clc
	jnz	.1			; if ZF
	stc				; then set CF
.1:

	pop	di
	pop	si
	pop	bx
	pop	ax

	pop	es
	pop	ds

	ret

;-------------------------------------------------------------------------------+
; KBC_Wait:									+
;										+
;	input:		none							+
;										+
;	output:		NC	= buffer is empty				+
;			CF	= buffer is not empty				+
;-------------------------------------------------------------------------------+

section .text

KBC_Wait:
	push	ax			; save ax
	push	bx			; save bx
	push	cx			; save cx
	mov	bx,5			; set super-long timeout
	xor	cx,cx			; cx=0:  timeout value
.1:
	out	0edh,ax			; I/O delay
	in	al,KBC_Status		; read 8042 status port
	and	al,KBC_in_buf_full	; input buffer full flag (D1)
	loopnz	.1			; loop until input buffer empty
						;   or timeout
	jz	.2			; success!
	dec	bx			; are we done yet?
	jnz	.1			; keep trying
	or	al,al			; success on last try?
.2:
	clc
	jz	.3
	stc
.3:
	pop	cx			; restore cx
	pop	bx			; restore bx
	pop	ax			; restore ax
	ret

;-------------------------------------------------------------------------------+
; KBC_GateA20:									+
;										+
;	input:		ah	= command to send to KBC			+
;										+
;	output:		NC	= Command succeeded				+
;			CF	= Command failed				+
;-------------------------------------------------------------------------------+

section .text

KBC_GateA20:
	pushf				; save interrupt status
	cli				; disable ints while using 8042

	Call	KBC_Wait		; insure 8042 input buffer empty
	jc	A20_Fail		; ret: 8042 unable to accept cmd

	out	0edh,ax			; I/O delay
	mov	al,0D1h			; 8042 cmd to write output port
	out	KBC_Status,al		; output cmd to 8042

	Call	KBC_Wait		; wait for 8042 to accept cmd
	jc	A20_Fail		; ret: 8042 unable to accept cmd

	mov	al,ah			; 8042 port data
	out	KBC_Control,al		; output port data to 8042

	Call	KBC_Wait		; wait for 8042 to port data
	jc	A20_Fail

	push	cx			; save CX
	mov	cx,14h			;
.DLY:
	out	0edh,ax			; Wait for KBC to execute the
	loop	.DLY			;  command.  (about 25uS)
	pop	cx			; restore CX

	clc

A20_Fail:
	popf				; restore flags
	ret

It's not compact, as it contains lots of checks, but if you need to remove all the checks to make it more compact, that should be easy enough. The I/O delays are needed on some systems, as without it your code may be too fast for the KBC controller.

ben321 · « **Reply #8 on:** February 02, 2019, 04:43:59 AM »

Quote from: debs3759 on February 02, 2019, 12:12:15 AM

Quote from: ben321 on January 30, 2019, 06:22:06 AM
Now the next step is to figure out the easiest (least lines of code) way to enable the A20 line, for COMPLETE access to the full 32bit memory space.

My code to test and enable A20 is:

Code: [Select]
%define KBC_Control 060h ; KBC Control Port %define KBC_Status 064h ; KBC Status Port %define KBC_in_buf_full 002h ; Input buffer full %define KBC_enable_A20 0DFh ; enable A20 command ;-------------------------------------------------------------------------------+ ; EnableA20: + ; + ; input: none + ; + ; output: CF A20 is set + ; NC A20 is not set + ; + ; This routine first tests whether A20 is set. If not, it tries to set it using + ; Port 92 (Fast A20). If that fails, it then writes to the keyboard + ; controller to set it. + ; + ; If A20 is enabled the routine returns with CF set, else CF is cleared. + ;-------------------------------------------------------------------------------+ section .text EnableA20: push ax push dx call TestA20 ; is A20 set? jc .1 ; yes, exit in al,092h ; read Port 92h or al,02h ; Set A20 bit out 092h,al ; write it - set A20 call TestA20 ; Test A20 again... jc .1 ; if set, exit mov ah,KBC_enable_A20 call KBC_GateA20 call TestA20 ; Test A20 again... .1: pop ax pop dx ret ;-------------------------------------------------------------------------------+ ; TestA20: + ; + ; input: none + ; + ; output: CF A20 is set + ; NC A20 is not set + ; + ; This routine test whether A20 is set or not. + ;-------------------------------------------------------------------------------+ section .text TestA20: push ds push es push ax ; used a few times... push bx ; likewise.... push si push di xor si,si xor di,di mov ds,si ; DS:SI points to Int 0 vector mov ax,0FFFFh mov es,ax mov di,010h ; ES:DI points to: ; 0x0000 if A20 not set ; 0x10000 if A20 is set cli push word [DS:0] ; save Int 0 vector lodsw ; ax contains int 0 vector mov bx,ax ; save in bx dec ax ; decrement ax stosw ; and save to ES:DI xor si,si xor di,di lodsw ; read DS:SI again cmp ax,bx ; if A20 is set, ZF is set pop word [DS:0] ; restore Int 0 vector sti clc jnz .1 ; if ZF stc ; then set CF .1: pop di pop si pop bx pop ax pop es pop ds ret ;-------------------------------------------------------------------------------+ ; KBC_Wait: + ; + ; input: none + ; + ; output: NC = buffer is empty + ; CF = buffer is not empty + ;-------------------------------------------------------------------------------+ section .text KBC_Wait: push ax ; save ax push bx ; save bx push cx ; save cx mov bx,5 ; set super-long timeout xor cx,cx ; cx=0: timeout value .1: out 0edh,ax ; I/O delay in al,KBC_Status ; read 8042 status port and al,KBC_in_buf_full ; input buffer full flag (D1) loopnz .1 ; loop until input buffer empty ; or timeout jz .2 ; success! dec bx ; are we done yet? jnz .1 ; keep trying or al,al ; success on last try? .2: clc jz .3 stc .3: pop cx ; restore cx pop bx ; restore bx pop ax ; restore ax ret ;-------------------------------------------------------------------------------+ ; KBC_GateA20: + ; + ; input: ah = command to send to KBC + ; + ; output: NC = Command succeeded + ; CF = Command failed + ;-------------------------------------------------------------------------------+ section .text KBC_GateA20: pushf ; save interrupt status cli ; disable ints while using 8042 Call KBC_Wait ; insure 8042 input buffer empty jc A20_Fail ; ret: 8042 unable to accept cmd out 0edh,ax ; I/O delay mov al,0D1h ; 8042 cmd to write output port out KBC_Status,al ; output cmd to 8042 Call KBC_Wait ; wait for 8042 to accept cmd jc A20_Fail ; ret: 8042 unable to accept cmd mov al,ah ; 8042 port data out KBC_Control,al ; output port data to 8042 Call KBC_Wait ; wait for 8042 to port data jc A20_Fail push cx ; save CX mov cx,14h ; .DLY: out 0edh,ax ; Wait for KBC to execute the loop .DLY ; command. (about 25uS) pop cx ; restore CX clc A20_Fail: popf ; restore flags ret
It's not compact, as it contains lots of checks, but if you need to remove all the checks to make it more compact, that should be easy enough. The I/O delays are needed on some systems, as without it your code may be too fast for the KBC controller.

Is there a way to do it without the keyboard controller? Given that Protected 32bit mode is an intentional feature in the CPU, you'd think that Intel would have given a more direct route to activate this mode, than require you to do a hacky thing with the KBC. I mean, I'd think they'd make it so activating 32bit protected mode automatically activated the A20 gate, since there is no circumstance in which you would need to run 32bit protected mode with A20 gate disabled (the whole A20 gate thing is only for 16bit legacy modes anyway). In 32bit protected mode, you will always want to activate the A20 gate.

And even if the A20 gate requires a separate action to activate it, you'd think that this Intel would create an opcode directly in the CPU for this VERY IMPORTANT operation, so one line of assembly code could activate the A20 gate. Or at the very least you'd think it would be something that could be activated via the correct output port using an OUT instruction, or via the correct BIOS interrupt with the INT instruction.

debs3759 · « **Reply #9 on:** February 02, 2019, 05:56:06 PM »

I think port 92 is independent from the KBC. It's certainly much faster than the older method, but not supported on all older systems. This link will tell you more: https://www.win.tue.nl/~aeb/linux/kbd/A20.html

NASM - The Netwide Assembler

News:

Author Topic: What's wrong with this code? (Read 38779 times)

ben321

What's wrong with this code?

Frank Kotler

Re: What's wrong with this code?

ben321

Re: What's wrong with this code?

ben321

Re: What's wrong with this code?

fredericopissarra

Re: What's wrong with this code?

ben321

Re: What's wrong with this code?

fredericopissarra

Re: What's wrong with this code?

debs3759

Re: What's wrong with this code?

ben321

Re: What's wrong with this code?

debs3759

Re: What's wrong with this code?