Author Topic: I don't understand the times expressions  (Read 34817 times)

Offline Devoum

  • Jr. Member
  • *
  • Posts: 13
I don't understand the times expressions
« on: March 04, 2010, 01:10:07 PM »
Hey all.

I'm a complete beginner in assembly and have chosen NASM to work my way up with. But I have a problem with the pseudo-instruction set. More precisely, I don't understand how the expressions "$ and $$" are used in correlation with times.

It's no problem for me to understand what "times" does, but I do not understand the expressions used together with times.

Like;

times 512-($-$$) db 0

It is explained what it does (in the tutorial), but not how it is done.

My questions according to this example;
1. Why is there a dash ("-") after 512 and before "(" ?
2. Can someone elaborate what $ means and is used for?
3. Can someone elaborate waht $$ means and is used for?

I read the 2 lines of description about $ and $$ multiple times now, and I still don't understand what they are. Can someone help?

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: I don't understand the times expressions
« Reply #1 on: March 04, 2010, 04:42:51 PM »
The mysterious '-' is a minus sign. That takes care of that part. :)

Nasm uses '$' for several purposes, in this case it's "here" - the current assembly location. Actually, the location at the beginning of the line.

'$$' is the beginning of the current section.

You might reasonably think that '$' would be enough, but '$', like any symbol, is a "relocatable value". That is, the linker and loader can alter this value in ways Nasm knows nothing about. In the specific case of "-f bin" output, Nasm acts as its own linker, but that happens in the "output driver", and we need to know this value in the actual "assemble" routine, which runs first. This is what Nasm means when it says "not a scalar value". It's a vector??? No, it's a "relocatable value".

In some cases, a relocatable value is okay...

Code: [Select]
mov eax, $

Nasm doesn't need to know the value at "assemble-time", it codes "mov eax, (this label)", and the linker/loader will fill in the relocated value. But:

Code: [Select]
mov eax, $ << 1

... now we're asking Nasm to do an assemble-time calculation on a value it doesn't know, so it whines.

The difference between two labels (in the same section!) is a "scalar value". The linker/loader can change the two values, but the distance between them remains the same. Therefore, we can do a calculation on it - and it's an acceptable argument to "times".

Code: [Select]
org 0 ; the default, if no origin specified
section .text ; the default, if no section specified

nop
nop
nop ; three bytes of "code"

times 512 - ($ - $$) db 0

In this case, '$' is 3 and '$$' is 0, so we're emitting 509 zeros to bring the total file size to 512 bytes. In this case, just '$' would have worked, if Nasm would accept it, which it won't.

Code: [Select]
org 7C00h
section .text

nop
nop
nop ; three bytes of "code"

times 512 - ($ - $$) db 0

Now, '$' is 7C03h and '$$' is 7C00h. We're still adding 509 zeros to pad to 512 bytes, but in this case '$' by itself would *not* do what you want!

The expression "512 - ($ - $$)" can be written without the parentheses as "512 + $$ - $", but this is even less clear (IMHO). In a bootsector, 510 is more common than 512, since we want the two byte "boot signature" after it, and the entire bootsector needs to be 512 bytes, but that doesn't alter '$' and '$$'...

Does that make you less confused, or more? :)

Best,
Frank


Offline Cyrill Gorcunov

  • NASM Developer
  • Full Member
  • *****
  • Posts: 179
  • Country: 00
Re: I don't understand the times expressions
« Reply #2 on: March 04, 2010, 06:26:14 PM »
I suspect it could be explained via ascii draw :) Something like

Code: [Select]
+ ---> start of section = $$
| some code
| some code
| some code
| some code
+ ---> current program counter = $
|
|
+ ---> how many bytes need to skip/fill until 512 bytes bound reached = 512 - ($ - $$)
« Last Edit: March 04, 2010, 06:30:12 PM by Cyrill Gorcunov »

Offline Devoum

  • Jr. Member
  • *
  • Posts: 13
Re: I don't understand the times expressions
« Reply #3 on: March 04, 2010, 06:40:40 PM »
Thank you both.

So '$' is a pointer to how far into the code we are positioned.
And '$$' is how far the section is?

Maybe abit badly put on my part =)

Either way, I think I understand the whole thing now.

One more thing though; When you say "512 - ($ - $$) is the same as "512 + $ - $$"" don't you mean "512 - $ - $$"?
And also, from my point of view, when we take 7C03 - 7C00 = 3, then
512 - 3 = 509,
Does that mean the instruction actually becomes 509 times 0 db? (Just to really dumb it down for me. I'm not the best in school)

I appriciate your time alot.
« Last Edit: March 04, 2010, 06:43:09 PM by Devoum »

Offline Cyrill Gorcunov

  • NASM Developer
  • Full Member
  • *****
  • Posts: 179
  • Country: 00
Re: I don't understand the times expressions
« Reply #4 on: March 04, 2010, 07:17:03 PM »
Thank you both.

So '$' is a pointer to how far into the code we are positioned.
And '$$' is how far the section is?

yeah, something like :)

Quote
Maybe abit badly put on my part =)

Either way, I think I understand the whole thing now.

One more thing though; When you say "512 - ($ - $$) is the same as "512 + $ - $$"" don't you mean "512 - $ - $$"?
And also, from my point of view, when we take 7C03 - 7C00 = 3, then
512 - 3 = 509,
Does that mean the instruction actually becomes 509 times 0 db? (Just to really dumb it down for me. I'm not the best in school)

I appriciate your time alot.

Frank was exactly right in details. You may open braces

512 - ($ - $$) = 512 + $$ - $

Example

$$ = 12, $ = 300

512 - (300 - 12) = 512 - 288 = 224

or

512 + 12 - 300 = 224

:)
« Last Edit: March 04, 2010, 07:22:01 PM by Cyrill Gorcunov »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: I don't understand the times expressions
« Reply #5 on: March 04, 2010, 07:20:59 PM »
Right. It's like writing "db 0" 509 times, except that it'll reduce the number of "db 0"s if you add code, to keep the total length at 512 bytes.

Best,
Frank


Offline Devoum

  • Jr. Member
  • *
  • Posts: 13
Re: I don't understand the times expressions
« Reply #6 on: March 04, 2010, 08:27:30 PM »

Frank was exactly right in details. You may open braces

512 - ($ - $$) = 512 + $$ - $

Example

$$ = 12, $ = 300

512 - (300 - 12) = 512 - 288 = 224

or

512 + 12 - 300 = 224

:)

Lol!

I didn't notice they were reversed.

512 - ($ - $$)
512 + $$ - $

And this my friends, is why I flunked math.

Anyway, everything is clear as the sky now, thanks for your great help guys. You'll probably hear more questions from me (although researched to the bone, of course) in the near future =)

Offline Devoum

  • Jr. Member
  • *
  • Posts: 13
Re: I don't understand the times expressions
« Reply #7 on: March 05, 2010, 10:58:34 AM »
I have one more simpel question:

In the manual it shows:

Code: [Select]
message db 'hello, world'
msglen equ $-message

$ is 1 in this case, right? And message is 12 obviously. But if we do 1-12 we get -11. How is $ used in this case? I understand we don't just do

Code: [Select]
msglen equ message
Because then msglen would be 'hello, world', but I don't see how we use $ to convert the string to its numeric value.

Offline Cyrill Gorcunov

  • NASM Developer
  • Full Member
  • *****
  • Posts: 179
  • Country: 00
Re: I don't understand the times expressions
« Reply #8 on: March 05, 2010, 03:46:38 PM »
I have one more simpel question:

In the manual it shows:

Code: [Select]
message db 'hello, world'
msglen equ $-message

$ is 1 in this case, right? And message is 12 obviously. But if we do 1-12 we get -11. How is $ used in this case? I understand we don't just do

Code: [Select]
msglen equ message
Because then msglen would be 'hello, world', but I don't see how we use $ to convert the string to its numeric value.

No, $ here is computed when end of string already reached, ie if we assume $ = 0 at the beginning,
then the $ in msglen will be 12.

Consider "message" as a pointer to a string (it's just a memory address), and msglen is a length of the string.

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: I don't understand the times expressions
« Reply #9 on: March 05, 2010, 08:24:11 PM »
I don't think I'm explaining this clearly. Maybe need to "start earlier" - you may not yet "get" what Nasm is doing for us.

Cyrill's last sentence brings up something that may help... It is a common convention to put "defined constants" (a.k.a "manifest constants"?) in ALL CAPS. I don't "like" ALL CAPS, so I often don't do it. But there's a fundamental difference between "message" and "msglen" that might be clearer if they "looked different".

I'm fond of "learning by example". This "example" is totally useless as "code", but a disassembly of it may help show what Nasm is doing with this stuff.

Code: [Select]
; not intended to be run!
; nasm -f bin dollar.asm -o dollar.bin
; ndisasm dollar.bin

; Try uncommenting one (only) of these, to
; see what "org" does. For ndisasm to get it right,
; it'll need "ndisasm -o 100h dollar.bin" (or -o 7C00h).
; By default, Nasm uses "org 0".
; org 100h
; org 7C00h

;------------------
section .data
; ".data" is a name Nasm "knows". "section .data" will be
; moved after "section .text"

message db "Hello, World!", 10
MSGLEN equ $ - message
FOO equ $
BAR equ $$

;------------------
section .text
; ".text" is a name Nasm "knows". It will be placed first in the file.

mov ax, message
; move the address (offset) into ax

mov ax, FOO
; move what '$' was when we first used it (the end of "message") into ax

mov ax, MSGLEN
; '$', where we first used it, minus "message" - the length (we hope)

mov ax, $
; now '$' has a different value

mov ax, BAR
; '$$', when we first used it, was the start of the .data section

mov ax, $$
; now '$$' is the start of the .text section, zero or
; the "org" (origin), if we specified one.

mov ax, [message]
; this moves the "[contents]" of the variable - 'H' and 'e'.
; Not useful in this case, just to illustrate the difference.


This is plain dumb 16-bit code. Assemble it with... well, just "nasm myfile.asm" would work, and save the output as just "myfile". I used "nasm -f bin -o myfile.bin myfile.asm" (actually named it "dollar.asm"... not very good).

Then disassemble it (don't try to run it) with "ndisasm myfile.bin". This just spews output to the screen. To save it, "ndisasm myfile.bin>myfile.dis". If you've uncommented an "org", tell ndisasm about it - "-o" switch to Nasm names the output file, "-o" switch to ndisasm gives the "origin". Here's what that gives us, with some comments added (ndisasm doesn't do comments).

Code: [Select]

; address    bytes in file     disassembly
00000000  B81800            mov ax,0x18 ; "message"
00000003  B82600            mov ax,0x26 ; what '$' was at end of message
00000006  B80E00            mov ax,0xe ; difference, the length
00000009  B80900            mov ax,0x9 ; the current '$'
0000000C  B81800            mov ax,0x18 ; what '$$' was in "section .data"
0000000F  B80000            mov ax,0x0  ; '$$' in "section .text"
00000012  A11800            mov ax,[0x18]

; From this point, ndisasm is attempting to disassemble data.
; Note that there's some zero padding to align "section .data".
; The "message" starts at address 0x18.

00000015  0000              add [bx+si],al
00000017  004865            add [bx+si+0x65],cl
0000001A  6C                insb
0000001B  6C                insb
0000001C  6F                outsw
0000001D  2C20              sub al,0x20
0000001F  57                push di
00000020  6F                outsw
00000021  726C              jc 0x8f
00000023  64210A            and [fs:bp+si],cx

If you're into "learning by example", fiddling around with something like this may help more than my long-winded explanations.

Best,
Frank


Offline Devoum

  • Jr. Member
  • *
  • Posts: 13
Re: I don't understand the times expressions
« Reply #10 on: March 06, 2010, 04:30:17 PM »
I still don't get much of it, and that frustrates me (retarded at mathematics and logic).

As part of my nature, I usually want every detail of a problem explained. Like when I went to school, I usually failed at math, because I wanted to understand why things worked, instead of how they worked. So my math teacher told me, some things are just meant to be used, without knowing the underlying mechanics.

Even though it goes completely against my way of learning, is that something I should take into account with things I can't understand?

I know this is very off-topic, but being completely new at assembly and NASM, maybe it would be better for me to just accept "this is how it works", instead of asking too many questions?

Offline Cyrill Gorcunov

  • NASM Developer
  • Full Member
  • *****
  • Posts: 179
  • Country: 00
Re: I don't understand the times expressions
« Reply #11 on: March 06, 2010, 05:29:14 PM »
I still don't get much of it, and that frustrates me (retarded at mathematics and logic).

Don't take it too close :) Don't think about them as math exercises.

Lets simplify it. I suppose $ could be considered as a byte counter. Every instruction
has its own length and when nasm encode instruction it increments $.

The "message db 'Hello, word'" consumes bytes too, right? So when nasm reaches
instruction  (or say our pseudo-instruction as "msglen equ $-message") after this "message"
the counter (or $ sign) is incremented up to length of the "message".

Frank, did I miss something?

Offline Devoum

  • Jr. Member
  • *
  • Posts: 13
Re: I don't understand the times expressions
« Reply #12 on: March 07, 2010, 03:01:17 PM »
The "message db 'Hello, word'" consumes bytes too, right? So when nasm reaches
instruction  (or say our pseudo-instruction as "msglen equ $-message") after this "message"
the counter (or $ sign) is incremented up to length of the "message".

Okay, so two things;
- $ increments after an instruction has been run through (is it assembled once?)?.
- Does it mean that in the case of "message db 'Hello World'" $ is incremented with the byte length of 'Hello World'? And what about the opcode bytes for the pseudo-instruction (label 'message' and db), if there are any?

Offline Cyrill Gorcunov

  • NASM Developer
  • Full Member
  • *****
  • Posts: 179
  • Country: 00
Re: I don't understand the times expressions
« Reply #13 on: March 07, 2010, 04:36:43 PM »
Okay, so two things;
- $ increments after an instruction has been run through (is it assembled once?)?.

yes, I think we may say so. Internally nasm makes a few passes over source file(s).

Quote
- Does it mean that in the case of "message db 'Hello World'" $ is incremented with the byte length of 'Hello World'? And what about the opcode bytes for the pseudo-instruction (label 'message' and db), if there are any?

yes, and pseudo-instruction more like "telling nasm what to do". The label is just a location address so nasm
does remember it and if some instruction refers on it -- returns this address. But label itself doesn't consume
bytes. And DB is just "tell nasm to reserve a byte for me".

So all this

Code: [Select]
message db 'Hello World'
is like to tell nasm "reserve 'Hello World' for me", where 'message' will be just a symbolic name of
those reserved bytes.

Perhaps the easiest way would be just to remember the syntax of "times" template.

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: I don't understand the times expressions
« Reply #14 on: March 07, 2010, 09:21:52 PM »
Hi Devoum,

I'm a terrible teacher. I found this out when my daughter was small, and I tried to help her with schoolwork. I'd explain something, and if she didn't understand it, I was stuck. I explained it, what else? I'm trying to get better at it.

I sympathize with your desire to "understand everything". This might be a time, temporarily, to "just shut up and do it". It isn't worth getting too "hung up" on this one issue. Better to understand what you're doing, and why, though!

I began to suspect, when you said, "$ is 1 in this case, right? And message is 12 obviously." that you understand '$' alright. It's other things that Nasm is doing - or not doing - for us that is leading to the confusion. My little "example" was supposed to illustrate that, but I guess I didn't explain what it was supposed to illustrate! I probably confused the issue by using two "sections", too. Lemme try again...

message db "hello world"

When Nasm sees "message", it knows it isn't an instruction or register name - must be a variable name. Nasm does not emit anything, at this point. It "remembers" the location of "message" by making an entry in its "symbol table". This information may be emitted into the header of a linkable object file (later), but nothing goes into the "code" (and data) part of the program. This value would be zero, if it were the first thing in the file - generally non-zero (in my example, it's 0x18). '$', if we had used it at this point, would have the same value.

When Nasm sees "db", it's a "pseudo-instruction". This tells Nasm "just emit these bytes, don't try to 'assemble' it" (that's why it's "pseudo", I guess). So Nasm puts 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd' (the ascii codes) into your file, counting "where are we" as it goes. Now...

msglen equ $ - message

None of this emits anything. "msglen", like "message" itself, is just a symbol. The equ" defines the value of "msglen" to be "where we are now" (0, or whatever "message" was, plus what we just emitted) minus "where we started". This "where we are now" minus "where we started" is (obviously?) the length of what we emitted. This too is "remembered". The value stored for "message" is the location, and the value stored for "msglen" is whatever we defined with "equ", so they're a "different kind of symbol", but all we've emitted so far - which is what '$' counts - is "hello world".

Now, when we go to "use" these symbols...

mov ecx, messsage
mov edx, msglen

or perhaps,

push msglen
push message

Nasm sees that "mov" or "push" is an instruction, so it emits the appropriate byte(s). (some instructions encode into a single byte, some require multiple bytes) Then the word "message" or "msglen" is replaced by the value we "remembered". All the CPU ever sees is a bunch of numbers. I mention this because another poster wanted to modify the label names at runtime - this isn't going to work.

Nasm has counted the "opcode plus operand" bytes that it has emitted, so '$' at this point would include them. In the "equ", '$' doesn't emit anything, but if we used it as an operand, it would:

mov eax, $

This will emit the opcode for "mov", and replace the '$' with its current value (the location at the beginning of the line - the "mov" opcode would not be included). If we used '$' again...

mov ebx, $

... Nasm would have counted the (5) bytes emitted by "mov eax, $", but nothing from the "mov ebx, $" yet. Only after Nasm has figured out how many bytes "mov ebx, $" is going to emit can it increment '$'. It isn't called '$' internally - I don't know what it's called, offhand. We can drag the source code in here and interrogate it, if that'll help...

That's about the same thing as Cyrill said, but differently. If neither of us can explain it "right" (for you), we'll get someone else in here to take a crack at it. The concept is not "too difficult" for you to understand, trust me!

In doing some research on a question about debugging, I came across the idea that "Why doesn't my program work?" is because "Something you believe to be true, is not true." Debugging is the process of figuring out what you believe to be true, at each stage of your program, and verifying that it really is true. The more you "understand everything" and the less you "just do it", the more successful you'll be, I figure. Don't let it hang you up forever, though. You can "just do it and move on" and come back to it later, if need be.

(By the way, this is not "off-topic" in the slightest. It's a good question!)

As Betov used to say, "Courage!"

Best,
Frank