NASM - The Netwide Assembler
NASM Forum => Other Discussion => Topic started by: neurophobos on November 08, 2013, 12:30:36 PM
-
Hi everyone!
I'm starting programming in nasm language. I just wrote a simple program for testing: here's its code:
buffer db 'PROVA'
times 64-$+buffer db '_'
mov eax, 4
xor ebx, ebx
inc ebx
mov ecx, message
mov edx, len
int 80h
mov eax, 1
mov ebx, 0
int 80h
message db '\u263a'
len equ $-message
That's all good with its output:
utente@laptop:~/programmazione/nasm$ ./prova ; echo $?
\u263a0
utente@laptop:~/programmazione/nasm$
But when I'm using gdb, I get these strange initial lines of code:
utente@laptop:~/programmazione/nasm$ gdb -q prova
Reading symbols from /home/utente/programmazione/nasm/prova...(no debugging symbols found)...done.
(gdb) break start
Breakpoint 1 at 0x8048060
(gdb) set disassembly intel
(gdb) disass
No frame selected.
(gdb) run
Starting program: /home/utente/programmazione/nasm/prova
Breakpoint 1, 0x08048060 in start ()
(gdb) disass
Dump of assembler code for function start:
=>0x08048060 <+0>: push eax
0x08048061 <+1>: push edx
0x08048062 <+2>: dec edi
0x08048063 <+3>: push esi
0x08048064 <+4>: inc ecx
0x08048065 <+5>: pop edi
0x08048066 <+6>: pop edi
0x08048067 <+7>: pop edi
0x08048068 <+8>: pop edi
0x08048069 <+9>: pop edi
0x0804806a <+10>: pop edi
0x0804806b <+11>: pop edi
0x0804806c <+12>: pop edi
0x0804806d <+13>: pop edi
0x0804806e <+14>: pop edi
0x0804806f <+15>: pop edi
0x08048070 <+16>: pop edi
0x08048071 <+17>: pop edi
0x08048072 <+18>: pop edi
0x08048073 <+19>: pop edi
0x08048074 <+20>: pop edi
0x08048075 <+21>: pop edi
0x08048076 <+22>: pop edi
0x08048077 <+23>: pop edi
0x08048078 <+24>: pop edi
0x08048079 <+25>: pop edi
0x0804807a <+26>: pop edi
0x0804807b <+27>: pop edi
0x0804807c <+28>: pop edi
0x0804807d <+29>: pop edi
0x0804807e <+30>: pop edi
0x0804807f <+31>: pop edi
0x08048080 <+32>: pop edi
0x08048081 <+33>: pop edi
0x08048082 <+34>: pop edi
0x08048083 <+35>: pop edi
0x08048084 <+36>: pop edi
0x08048085 <+37>: pop edi
0x08048086 <+38>: pop edi
0x08048087 <+39>: pop edi
0x08048088 <+40>: pop edi
0x08048089 <+41>: pop edi
0x0804808a <+42>: pop edi
0x0804808b <+43>: pop edi
0x0804808c <+44>: pop edi
0x0804808d <+45>: pop edi
0x0804808e <+46>: pop edi
---Type <return> to continue, or q <return> to quit---
0x0804808f <+47>: pop edi
0x08048090 <+48>: pop edi
0x08048091 <+49>: pop edi
0x08048092 <+50>: pop edi
0x08048093 <+51>: pop edi
0x08048094 <+52>: pop edi
0x08048095 <+53>: pop edi
0x08048096 <+54>: pop edi
0x08048097 <+55>: pop edi
0x08048098 <+56>: pop edi
0x08048099 <+57>: pop edi
0x0804809a <+58>: pop edi
0x0804809b <+59>: pop edi
0x0804809c <+60>: pop edi
0x0804809d <+61>: pop edi
0x0804809e <+62>: pop edi
0x0804809f <+63>: pop edi
0x080480a0 <+64>: mov eax,0x4
0x080480a5 <+69>: xor ebx,ebx
0x080480a7 <+71>: inc ebx
0x080480a8 <+72>: mov ecx,0x80480c0
0x080480ad <+77>: mov edx,0x6
0x080480b2 <+82>: int 0x80
0x080480b4 <+84>: mov eax,0x1
0x080480b9 <+89>: mov ebx,0x0
0x080480be <+94>: int 0x80
End of assembler dump.
(gdb)
Here's my two questions:
1) why is nasm using pop edi instead of push edi? It there no risk of damaging other data stored in stack before these ops? I'd rather have expected multiple pushings, to go up (i.e. lower addresses of stack segment)
2) What do
0x08048061 <+1>: push edx
0x08048062 <+2>: dec edi
0x08048063 <+3>: push esi
0x08048064 <+4>: inc ecx
refer to? where did nasm get this block from?
Thank you for all your help and attention! Have a nice day!
Neuro
-
Hi Neuro,
I don't understand the question. I don't get what you're trying to do. Did you intend for "PROVO" and the underscores to be in your ".data" section? If it's in "section .text" (which it is), it'll be executed.
Nasm is doing what you told it to do. You're correct that this will damage the data on your stack. If you wanted "push edi", pad it with "W"s... but this doesn't seem very useful either... Multiple "push"es would result in lower values for esp, as you state. Multiple "pop"s are increasing esp. The addresses that gdb is showing are eip, not esp, though...
What are you trying to "test" with this?
Best,
Frank
-
Hi Frank! Thank you for your response.
I'm sorry, I posted my italian code. I'll try to be clearer.
"Prova" means just "test", and the code
times 64-$+buffer db '_'
is just a reserving mem space loop.
I suppose (correct me if i'm wrong) that db ' '
would get the same effect, except for reserving empty space (I don't care for now).
The thing is that nasm did automatically convert this code into those pop edi
Why did it used a pop code? I wonder how this program did not go into a seg fault. As we stated, it should have used push edi (or something alike). Have I to do something to instruct nasm not to use pop?
This program is just a test though, I just wanted to see how times 64-$+buffer db '_'
would have been converted in run-time debugging.
My main doubt is that I cannot get why not all the space between $eip+0 and $eip+63 (which is the reserved space of that instruction) is not filled with the same instruction (pop edi): the first 5 instructions aren't pop edi like the others.
And they're not result of my code.
The db is my first instruction (none of those five lines seem to be their run time equivalent, but I neither think db should turn into a run-time instruction), hence the "times" loop goes up until the resting code, which is
mov eax, 4
xor ebx, ebx
inc ebx
mov ecx, message
mov edx, len
int 80h
mov eax, 1
mov ebx, 0
int 80h
message db '\u263a'
len equ $-message
consistent with gdb's one:
0x080480a0 <+64>: mov eax,0x4
0x080480a5 <+69>: xor ebx,ebx
0x080480a7 <+71>: inc ebx
0x080480a8 <+72>: mov ecx,0x80480c0
0x080480ad <+77>: mov edx,0x6
0x080480b2 <+82>: int 0x80
0x080480b4 <+84>: mov eax,0x1
0x080480b9 <+89>: mov ebx,0x0
0x080480be <+94>: int 0x80
So, basically, where push eax
push edx
dec edi
push esi
inc ecx
do actually come from?
Again, thank you, have a nice day!
-
Hi Frank! Thank you for your response.
I'm sorry, I posted my italian code. I'll try to be clearer.
"Prova" means just "test", and the code
times 64-$+buffer db '_'
is just a reserving mem space loop.
I suppose (correct me if i'm wrong) that db ' '
would get the same effect, except for reserving empty space (I don't care for now).
What Frank is saying, however, is that you put that data into the .text section, it should be put into a different section (like .data or .rodata). The way it is, it's being translated as instructions instead of separated into a different section.
The thing is that nasm did automatically convert this code into those pop edi
Why did it used a pop code?
Nasm, like all assemblers, take instructions and convert them into blocks of data (eg. Machine Code). By coincidence, the '_' character is ASCII code 0x5F which is also the same as the machine instruction code for "POP EDI".
I wonder how this program did not go into a seg fault.
Because you didn't read enough from the stack to access invalid memory and you didn't attempt to use stack based control instructions like RET to cause you to change execution to protected memory. In other words, the "POP EDI"s shown are simply reading data on the stack that the OS has left there for you (Like command line arguments, environment variables, elf aux. table, etc.).
As we stated, it should have used push edi (or something alike). Have I to do something to instruct nasm not to use pop?
You didn't tell it to put "PUSH EDI" you told it to put "_" in the code section, which (as I said before) is the same code as "POP EDI".
My main doubt is that I cannot get why not all the space between $eip+0 and $eip+63 (which is the reserved space of that instruction) is not filled with the same instruction (pop edi): the first 5 instructions aren't pop edi like the others.
And they're not result of my code.
But they are a result of your code. The "PROVA" is also being inserted as machine code.
=>0x08048060 <+0>: push eax ; DB "P"
0x08048061 <+1>: push edx ; DB "R"
0x08048062 <+2>: dec edi ; DB "O"
0x08048063 <+3>: push esi ; DB "V"
0x08048064 <+4>: inc ecx ; DB "A"
0x08048065 <+5>: pop edi ; DB "_"
0x08048066 <+6>: pop edi ; DB "_"
0x08048067 <+7>: pop edi ; DB "_"
The db is my first instruction (none of those five lines seem to be their run time equivalent, but I neither think db should turn into a run-time instruction), hence the "times" loop goes up until the resting code
DB is not an instruction, it's a directive. This directive tells the assembler to put the bytes that follow directly into the executable where they appear. NASM handles converting ASCII strings into byte sequences for you, so can use:
DB "PROVA"
instead of:
DB 0x50
DB 0x52
DB 0x49
DB 0x56
DB 0x41
or:
push eax
push edx
dec edi
push esi
inc ecx
Which are all the same thing after the assembler is done with it. :)
-
Here's another way of looking at it, coming at it from the other end, so to speak:
global _start
section .text
_start:
nop
mov edx, stuff_len
mov ecx, stuff
mov ebx, 1
mov eax, 4
int 80h
mov eax, 1
xor ebx, ebx
int 80h
; although this appears to be "code", it will not be executed (we've already exited)
stuff:
push eax
push edx
dec edi
push esi
inc ecx
stuff_len equ $ - stuff
Everything in your file is "just bytes". If the CPU (or debugger) stumbles into 'em, they'll be interpreted as code, and executed (if possible). If they're printed, they'll be interpreted as ascii codes for characters. In another position (in a linkable object file) they might be interpreted as instructions to the linker, how to link it. If in an executable, they might be "code" or "data" or instructions to the loader. It's all "just bytes" - how they're interpreted depends on where they are.
An "old school" way of coding would look like:
jmp start
db "hello world"
db "other data"
start:
; real "code" continues from here...
You could do this with "PROVA" and the underscores as an alternative to putting it in the ".data" section. We don't want this to be executed - didn't crash in this case, but you were just lucky.
I've got a file (can't lay my hands on it at the moment) which contains entirely "garbage code" - real instructions, but they don't make any sense. If assembled and viewed (not run), it looks like "HELLO WORLD". Merely a toy - not intended to be useful for anything.
This must seem pretty strange when you first encounter it, but you'll get used to it.
Best,
Frank
-
Hello my friends.
First of all thanks, your answers are very clear and complete.
I continue bothering you with other questions popped up reading this thread.
If I got it right, DB it's not an instruction, it's a directive
means basically that it's not to be translated in opcode like others instructions (i.e. mov eax, 1 etc). This means that whatever i put in a DB directive, it will be stored in main memory as it is. In this case it's stored like instructions because where inside the .text segment, retrieving the instructions from the equivalent ascii values of what i store in.
My question here is where I can get a chart of equivalent opcodes/ASCII from? I suppose it could be possible to write an entire program with a single "db <nonsense string>" (here by "program" I mean a meaningless program with just opcodes in text segment). This chart would be of great interest.
My great doubt is that, based upon what i've read here, nasm interprets as instructions the equivalent ascii codes of a db statement like this one. I thought this happens because i define db 'bla bla' in .text segment, so nasm tries to convert in opcodes what it supposes to be just instructions and no data declarations.
So i dit a try, creating this:
global _start
SECTION .text
_start:
db 'PROVAAAaW'
mov eax, 1
mov ebx, 1
int 80h
SECTION .data
db 'PROVAAAAaW'
mov eax, 1
xor ecx, ecx
int 80h
With this, I wouldn't have expected to find the db statement in section .data following the .text code in gdb:
(gdb) disass
Dump of assembler code for function _start:
=> 0x08048080 <+0>: push eax
0x08048081 <+1>: push edx
0x08048082 <+2>: dec edi
0x08048083 <+3>: push esi
0x08048084 <+4>: inc ecx
0x08048085 <+5>: inc ecx
0x08048086 <+6>: inc ecx
0x08048087 <+7>: popa
0x08048088 <+8>: push edi
0x08048089 <+9>: mov eax,0x1
0x0804808e <+14>: mov ebx,0x1
0x08048093 <+19>: int 0x80
End of assembler dump.
(gdb) x/25i $eip
=> 0x8048080 <_start>: push eax
0x8048081 <_start+1>: push edx
0x8048082 <_start+2>: dec edi
0x8048083 <_start+3>: push esi
0x8048084 <_start+4>: inc ecx
0x8048085 <_start+5>: inc ecx
0x8048086 <_start+6>: inc ecx
0x8048087 <_start+7>: popa
0x8048088 <_start+8>: push edi
0x8048089 <_start+9>: mov eax,0x1
0x804808e <_start+14>: mov ebx,0x1
0x8048093 <_start+19>: int 0x80
0x8048095: add BYTE PTR [eax],al
0x8048097: add BYTE PTR [eax+0x52],dl
0x804809a: dec edi
0x804809b: push esi
0x804809c: inc ecx
0x804809d: inc ecx
0x804809e: inc ecx
0x804809f: inc ecx
0x80480a0: popa
0x80480a1: push edi
0x80480a2: mov eax,0x1
---Type <return> to continue, or q <return> to quit---
0x80480a7: xor ecx,ecx
0x80480a9: int 0x80
(gdb)
No trace of that "db 'PROVAAAAaW'" as string in $esp (initialized data should be in stack segment right?).
As you can see after $eip+19, there are again those fake instructions, but the first two are being translated differently.
In .text section
0x08048080 <+0>: push eax
0x08048081 <+1>: push edx
In .data section:
0x8048095: add BYTE PTR [eax],al
0x8048097: add BYTE PTR [eax+0x52],dl
I thought it happens because I do not store an actual variable, so I tried a second version, with this code:
global _start
SECTION .text
_start:
db 'PROVAAAaW'
mov eax, 1
mov ebx, 1
int 80h
SECTION .data
aw db 'PROVAAAAaW'
mov eax, 1
xor ecx, ecx
int 80h
but I get the same result, no trace of "PROVAAAAaW" as string, nor aw (except for its normal presence in symbol table of elf object file).
utente@utente-virtual-machine:~/Scrivania/programmazione/nasm$ readelf -s prova2
Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 08048080 0 SECTION LOCAL DEFAULT 1
2: 08049098 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 FILE LOCAL DEFAULT ABS prova2.asm
4: 08049098 0 NOTYPE LOCAL DEFAULT 2 aw
5: 00000000 0 FILE LOCAL DEFAULT ABS
6: 08048080 0 NOTYPE GLOBAL DEFAULT 1 _start
7: 080490ab 0 NOTYPE GLOBAL DEFAULT 2 __bss_start
8: 080490ab 0 NOTYPE GLOBAL DEFAULT 2 _edata
9: 080490ac 0 NOTYPE GLOBAL DEFAULT 2 _end
utente@utente-virtual-machine:~/Scrivania/programmazione/nasm$
Why does this happen? Again thank you for your patience, I'm trying to get the functioning of nasm itself rather than actually programming something.
Best,
Neuro
-
My question here is where I can get a chart of equivalent opcodes/ASCII from?
There isn't one. The ASCII code chart is available online from a variety of sources, such as this one (http://www.asciitable.com/index/asciifull.gif). Opcodes, however, aren't as easy as ASCII. If you take a look at the x86 Opcode Table (http://sparksandflames.com/files/x86InstructionChart.html) You'll notice that not all instructions are just one byte, many of them can be much larger (up to even 15 bytes including prefixes under IA16).
I suppose it could be possible to write an entire program with a single "db <nonsense string>" (here by "program" I mean a meaningless program with just opcodes in text segment).
Yes, it's possible, but not really recommended.
My great doubt is that, based upon what i've read here, nasm interprets as instructions the equivalent ascii codes of a db statement like this one. I thought this happens because i define db 'bla bla' in .text segment, so nasm tries to convert in opcodes what it supposes to be just instructions and no data declarations.
When NASM see's DB, it just converts to bytes. It doesn't care about codes, it just converts to bytes. When NASM see's a quoted string, it converts those into a list of ASCII bytes which then get passed to DB.
So i dit a try, creating this:
global _start
SECTION .text
_start:
db 'PROVAAAaW'
mov eax, 1
mov ebx, 1
int 80h
SECTION .data
db 'PROVAAAAaW'
mov eax, 1
xor ecx, ecx
int 80h
I have no idea why you did that. You should have probably done something like:
global _start
SECTION .text
_start:
mov eax, 1
mov ebx, 1
int 80h
SECTION .data
VAR1: db 'PROVAAAaW'
As you can see, I didn't add any DB stuff to the .text section. With this code, GDB won't fubar your output. Also, the reason I added a label "VAR1" is so that in GDB (as long as you gave both NASM and GCC the '-g' option) you can use: printf "%s\n", &VAR1
to display the contents of that variable.
With this, I wouldn't have expected to find the db statement in section .data following the .text code in gdb:
Well, who said that your .data is after your .text? The reason you put things in sections is to group data together. The Linker gets to decide where that data ends up, it might be after your .text, might be before, unless you specifically write your own LD scripts (which is beyond the scope of this forum), you don't really have a say in where things end up, the linker gets to do that.
(gdb) disass
Dump of assembler code for function _start:
=> 0x08048080 <+0>: push eax
0x08048081 <+1>: push edx
0x08048082 <+2>: dec edi
0x08048083 <+3>: push esi
0x08048084 <+4>: inc ecx
0x08048085 <+5>: inc ecx
0x08048086 <+6>: inc ecx
0x08048087 <+7>: popa
0x08048088 <+8>: push edi
0x08048089 <+9>: mov eax,0x1
0x0804808e <+14>: mov ebx,0x1
0x08048093 <+19>: int 0x80
End of assembler dump.
(gdb) x/25i $eip
=> 0x8048080 <_start>: push eax
0x8048081 <_start+1>: push edx
0x8048082 <_start+2>: dec edi
0x8048083 <_start+3>: push esi
0x8048084 <_start+4>: inc ecx
0x8048085 <_start+5>: inc ecx
0x8048086 <_start+6>: inc ecx
0x8048087 <_start+7>: popa
0x8048088 <_start+8>: push edi
0x8048089 <_start+9>: mov eax,0x1
0x804808e <_start+14>: mov ebx,0x1
0x8048093 <_start+19>: int 0x80
0x8048095: add BYTE PTR [eax],al
0x8048097: add BYTE PTR [eax+0x52],dl
0x804809a: dec edi
0x804809b: push esi
0x804809c: inc ecx
0x804809d: inc ecx
0x804809e: inc ecx
0x804809f: inc ecx
0x80480a0: popa
0x80480a1: push edi
0x80480a2: mov eax,0x1
---Type <return> to continue, or q <return> to quit---
0x80480a7: xor ecx,ecx
0x80480a9: int 0x80
(gdb)
No trace of that "db 'PROVAAAAaW'" as string in $esp (initialized data should be in stack segment right?).
No, uninitialized data goes into the stack segment. Initialize data is found at a set offset from the executable's starting point in memory. This location will be object format dependent and the location of your executable in memory changes each time it's executed (usually).
-
For one thing, you're looking at the output from gdb and blaming it on Nasm. I assure you that Nasm put your string in your code, but you didn't ask gdb to display it as a string, you asked gdb to disassemble it... so it did. I don't know if there's a way to ask gdb to display ascii text or not - I imagine there is. I'm not very good with gdb!
There used to be a Unix utility called "dump" but it does not exist on my system, and "apt-get" is being "apt-don't-get". I also wrote my own version, but apparently that got left behind on my "old" machine. I'll have to go back and fetch it, or write it over. Wasn't very difficult, but I'm not in the mood right now. I think seeing "just the bytes" - which is what Nasm produces - would help you understand what's going on here. I'll get back to ya on this "if the spirit moves me". Hang in there!
Best,
Frank
-
So, everything I write quoted is translated in ASCII.
While in memory there are just bytes, it's being displayed in gdb as instructions because I'm disassembling x/i, so gdb tries to (and not nasm actually did) convert those ascii bytes interpreting what they could mean (I suppose gdb has an array of symbols to read from to map opcodes it displays disassembling programs).
It's a coincidence that those single characters are all equivalent to unary operations
50-> PUSH eax ;P
52-> PUSH edx ;R
4F-> DEC eDI ;O
56-> PUSH eSI ;V
61-> POPA ;a
57-> PUSH edi ;w
otherwise I'd have found incomplete binary opcodes (or complete since gdb may pick up more bytes to make a sensed interpretation?).
DD is a directive as it stores what I tell it to store as it is. There's no conversion in opcode because it's not an operation. For the operations, nasm processes them and converts them in their relative hex values.
I bet that the loader understands what is instruction and what operand (since the hex values combos may coincide or overlap) because a protocol of conversion/fetching is followed.
You've been very precious and kind, I thank you all very much for your explanations.
Neuro