NASM - The Netwide Assembler
NASM Forum => Programming with NASM => Topic started by: hippyhappo on January 24, 2017, 08:56:32 PM
-
I'm toying around with a boot sector / creating a flat binary file, and have come across some behavior I don't quite understand. For example, consider the following:
mov ax, 0x7C0
mov ds, ax
fooA db 'A'
mov ah, 0xE
mov al, [fooA]
int 0x10 ; BIOS interrupt / print "A" to screen
dw 0xFFFF
fooB db 'B'
mov ah, 0xE
mov al, [fooB]
int 0x10 ; BIOS interrupt / print "B" to screen
times 512 - ($ - $$) db 0
This prints only the "A" to the screen (not the "B"). However, if I replace "dw 0xFFFF" with a different value (such as 0x0000, or 0xEEEE), both "A" and "B" are correctly printed.
If I explicitly create a .data section for dw 0xFFFF it corrects the problem, but I thought sections were irrelevant in a flat binary file. Additionally, why do the db 'A' and db 'B' definitions work outside of a .data section? Why do some word values work (e.g. 0x0000, or 0xEEEE), while 0xFFFF doesn't?
Hopefully somebody more knowledgeable than myself understands what I'm getting at. I feel like I have a fundamental misunderstanding of something, but I'm not quite sure what it is. If anybody could clear some of this up, it would be much appreciated.
Thanks
-
Hi hippyhappo,
Welcome to the forum. You are correct that there are no "segments" in a flat binary file. However, Nasm takes "segment .data" (or "section .data" is an alias) as an instruction to "move this stuff to the end" (and "section .bss" after that). You probably don't want to do this in a bootsector, as it will move this stuff after your padding - which you want at the end.
You do not want to put your data in the middle of your code - it will be executed (if possible). 'A' is executed as "inc cx" - useless but harmless. 0xFF plus 'B" plus part of "mov ax, 0xE", however executes as "inc word[bp + si - 0x4C]" and some other garbage. You can see this if you disassemble your code with Ndisasm. If you change the 0xFFFF to something else, it will execute as something else, which will (sometimes) get "synched up" in time to execute your print. Observe with Ndisasm.
This is a general rule. You don't want to put something intended as data in the middle of code. It will execute, perhaps harmlessly or perhaps messing up following code. In a bootsector, put it after your code but before your padding.
Best,
Frank
-
Thanks a bunch for the info. That clears up a lot of the confusion, and also explains why the boot sector file was larger than 512 bytes when the .data section was explicitly specified (I didn't realize that NASM moved it past the padding).
So if I include multiple .data sections / external files / macros / libraries (with their own data sections), NASM then rearranges / assembles those into a single data section at the bottom as well?
How exactly does the processor know where the data section begins / that something should be evaluated as data, or executable code? I was under the impression that as far as the processor is concerned, there isn't really any difference between data, and executable code (i.e. that each instruction is essentially an independent / self-contained package, including any relevant modifiers and data elements with the instruction opcode). If that is the case, why is it necessary for data elements to be contiguous?
Having programmed in numerous higher level languages prior to assembly, I think I'm subconsciously making a lot of incorrect assumptions based on that background knowledge / mixing up concepts.
Anyway, thanks again for your help.
-
Well a flat binary file wouldn't have libraries, but include files and macros would combine data sections (not really "segments") at the end. I should have mentioned that putting "data" at the end doesn't prevent it from being executed. You'd normally end a bootesector with something like:
here: jmp here
; data and padding
to prevent the data from being executed. (or return to the OS if you've got one) You can also jump over the data. It is normal for a bootsector to begin with a short jump and a nop or a near jump followed by the "Bios Parameter Block" (information about the disk - this is what makes it "formatted").
You are correct that the processor doesn't know anything about "data". It's all just bytes. If we don't tell it to go someplace else, it just loads the next byte. An instruction may include some prefix bytes, one or more bytes of "instruction", and perhaps some operands. The processor "knows" how many bytes to gobble up. The prefix byte can alter this. "mov ax, xx" and "mov eax, xxxx" are the same instruction (!), but the latter expects four bytes of operand. This is why we can't use BIOS interrupts (for example) in 32-bit code.
Assembly language differs from higher level languages in that it doesn't do any programming. It's all up to us. Assembly language is just "names for the machine code bytes" (human semi-readable or semi-human readable... I'm not sure) We have to do things like not trying to execute something intended as data and returning to the OS (if any).
Best,
Frank
-
Ahhhh, I think this was the missing link. I was missing (or rather, hadn't given conscious consideration to) the big picture idea that it falls entirely on us to direct the processor through the intended executable code / away from the data section. When I reconsider my questions (while keeping this idea in the back of my mind), everything I was a bit shaky on just fell into place / clicked in my head.
Thanks again for all your help. I had gone through countless books / tutorials / forum posts, running into a wall over and over for about a week, and you cleared it all up in a couple of paragraphs.