Ok, it turns out we were both part right. You were right in that the index represented by the segment registers was a BYTE offset into the GDT (not the entry number in the GDT).
I was correct in assuming (when doing the far jump) that there needed to be an offset added to the label. The offset is the program's starting section number (as used in real mode) times 16. The reason for that is that when in DOS running a COM file, there is a bunch of stuff in the memory space before the COM file (Everything before section 0x1FE), and that stuff is DOS itself. Unless you want to far jump into DOS's own code (and crash the OS), then you NEED that offset added to the label "start32". The final version of the far jump that I have tested (and I know it works) is this:
jmp 8:(start32+0x1FE0)
So my complete working code is now this:
org 0x100
USE16 ;use 16bit code
start:
cli ;disable interrupts
lgdt [GDTP] ;Point the LGDT instruction to the location in memory where the pointer (GDTP, aka GDT Pointer) exists that points to the GDT
;perform procedure to put the the CPU into Protected Mode
mov eax,cr0
or al,1
mov cr0,eax
jmp 8:(start32+0x1FE0)
;In the memory space in protected mode, the function start32 starts at its label's address + the segment address (which is 16 times the real-mode segment value of 0x1FE)
; not once you've set bit 0 of cr0 to 1! Then. what is in the segment registers is a "selector" - an index into the descriptor table.
USE32 ;32bit code starts here
start32:
nop
nop
nop
jmp start32
GDTP: ;GDT pointer
dw 24 ;GDT is 24 bytes in size
dd GDT+0x1FE0 ;In absolute memory space, the GDT starts at its label's address + the segment address (which is 16 times the segment value of 0x1FE)
;The following GDT sets the code and data segments to be the same, and makes it start at the base address of 0 and occupy all 4GB possible for 32bit memory
GDT: ;Copied GDT from from Wikipedia
; offset 0x0
; null descriptor:
dq 0
; offset 0x8
; code: ; cs should point to this descriptor
dw 0xffff ; segment limit first 0-15 bits
dw 0 ; base first 0-15 bits
db 0 ; base 16-23 bits
db 0x9a ; access byte
db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
db 0 ; base 24-31 bits
; offset 0x10
; data: ; ds, ss, es, fs, and gs should point to this descriptor
dw 0xffff ; segment limit first 0-15 bits
dw 0 ; base first 0-15 bits
db 0 ; base 16-23 bits
db 0x92 ; access byte
db 11001111b ; high 4 bits (flags) low 4 bits (limit 4 last bits)(limit is 20 bit wide)
db 0 ; base 24-31 bits
Now the next step is to figure out the easiest (least lines of code) way to enable the A20 line, for COMPLETE access to the full 32bit memory space.
Note that in all the online examples I've seen for protected mode, they always write code with the assumption that one will back out of protected mode at some point, and go into real mode again. The result is a huge amount of code written to save the state of real mode, and prepare the CPU for the ability to reenter real mode. This is not my intent, and I consider that extra code nothing but junk code. The problem is what parts of it are junk, and what parts are required? My intent is to write a 32bit application that initiates protected mode, jumps into protected mode, and then runs in protected mode until the system is reset (reboot by poweroff on real hardware, or DosBox is closed and restarted when using DosBox). This should HUGELY simplify the code, but there's one problem. I can't find even ONE piece of sample code online where the person is entering protected mode with the intent to not leave protected mode. So I have NOTHING to look at to guide my programming efforts. I hope somebody here will be able to help me.