NASM - The Netwide Assembler

NASM Forum => Other Discussion => Topic started by: neurophobos on November 08, 2013, 12:30:36 PM

Title: Strange Padding with times instruction
Post by: neurophobos on November 08, 2013, 12:30:36 PM: Hi everyone!
I'm starting programming in nasm language. I just wrote a simple program for testing: here's its code:

Code: [Select]
buffer db 'PROVA' times 64-$+buffer db '_' mov eax, 4 xor ebx, ebx inc ebx mov ecx, message mov edx, len int 80h mov eax, 1 mov ebx, 0 int 80h message db '\u263a' len equ $-message
That's all good with its output:
Code: [Select]
utente@laptop:~/programmazione/nasm$ ./prova ; echo $? \u263a0 utente@laptop:~/programmazione/nasm$
But when I'm using gdb, I get these strange initial lines of code:

Code: [Select]
utente@laptop:~/programmazione/nasm$ gdb -q prova Reading symbols from /home/utente/programmazione/nasm/prova...(no debugging symbols found)...done. (gdb) break start Breakpoint 1 at 0x8048060 (gdb) set disassembly intel (gdb) disass No frame selected. (gdb) run Starting program: /home/utente/programmazione/nasm/prova Breakpoint 1, 0x08048060 in start () (gdb) disass Dump of assembler code for function start: =>0x08048060 <+0>: push eax 0x08048061 <+1>: push edx 0x08048062 <+2>: dec edi 0x08048063 <+3>: push esi 0x08048064 <+4>: inc ecx 0x08048065 <+5>: pop edi 0x08048066 <+6>: pop edi 0x08048067 <+7>: pop edi 0x08048068 <+8>: pop edi 0x08048069 <+9>: pop edi 0x0804806a <+10>: pop edi 0x0804806b <+11>: pop edi 0x0804806c <+12>: pop edi 0x0804806d <+13>: pop edi 0x0804806e <+14>: pop edi 0x0804806f <+15>: pop edi 0x08048070 <+16>: pop edi 0x08048071 <+17>: pop edi 0x08048072 <+18>: pop edi 0x08048073 <+19>: pop edi 0x08048074 <+20>: pop edi 0x08048075 <+21>: pop edi 0x08048076 <+22>: pop edi 0x08048077 <+23>: pop edi 0x08048078 <+24>: pop edi 0x08048079 <+25>: pop edi 0x0804807a <+26>: pop edi 0x0804807b <+27>: pop edi 0x0804807c <+28>: pop edi 0x0804807d <+29>: pop edi 0x0804807e <+30>: pop edi 0x0804807f <+31>: pop edi 0x08048080 <+32>: pop edi 0x08048081 <+33>: pop edi 0x08048082 <+34>: pop edi 0x08048083 <+35>: pop edi 0x08048084 <+36>: pop edi 0x08048085 <+37>: pop edi 0x08048086 <+38>: pop edi 0x08048087 <+39>: pop edi 0x08048088 <+40>: pop edi 0x08048089 <+41>: pop edi 0x0804808a <+42>: pop edi 0x0804808b <+43>: pop edi 0x0804808c <+44>: pop edi 0x0804808d <+45>: pop edi 0x0804808e <+46>: pop edi ---Type <return> to continue, or q <return> to quit--- 0x0804808f <+47>: pop edi 0x08048090 <+48>: pop edi 0x08048091 <+49>: pop edi 0x08048092 <+50>: pop edi 0x08048093 <+51>: pop edi 0x08048094 <+52>: pop edi 0x08048095 <+53>: pop edi 0x08048096 <+54>: pop edi 0x08048097 <+55>: pop edi 0x08048098 <+56>: pop edi 0x08048099 <+57>: pop edi 0x0804809a <+58>: pop edi 0x0804809b <+59>: pop edi 0x0804809c <+60>: pop edi 0x0804809d <+61>: pop edi 0x0804809e <+62>: pop edi 0x0804809f <+63>: pop edi 0x080480a0 <+64>: mov eax,0x4 0x080480a5 <+69>: xor ebx,ebx 0x080480a7 <+71>: inc ebx 0x080480a8 <+72>: mov ecx,0x80480c0 0x080480ad <+77>: mov edx,0x6 0x080480b2 <+82>: int 0x80 0x080480b4 <+84>: mov eax,0x1 0x080480b9 <+89>: mov ebx,0x0 0x080480be <+94>: int 0x80 End of assembler dump. (gdb)
Here's my two questions:
1) why is nasm using pop edi instead of push edi? It there no risk of damaging other data stored in stack before these ops? I'd rather have expected multiple pushings, to go up (i.e. lower addresses of stack segment)
2) What do
Code: [Select]
0x08048061 <+1>: push edx 0x08048062 <+2>: dec edi 0x08048063 <+3>: push esi 0x08048064 <+4>: inc ecxrefer to? where did nasm get this block from?

Thank you for all your help and attention! Have a nice day!

Neuro
Title: Re: Strange Padding with times instruction
Post by: Frank Kotler on November 08, 2013, 02:20:57 PM: Hi Neuro,

I don't understand the question. I don't get what you're trying to do. Did you intend for "PROVO" and the underscores to be in your ".data" section? If it's in "section .text" (which it is), it'll be executed.

Nasm is doing what you told it to do. You're correct that this will damage the data on your stack. If you wanted "push edi", pad it with "W"s... but this doesn't seem very useful either... Multiple "push"es would result in lower values for esp, as you state. Multiple "pop"s are increasing esp. The addresses that gdb is showing are eip, not esp, though...

What are you trying to "test" with this?

Best,
Frank
Title: Re: Strange Padding with times instruction
Post by: neurophobos on November 08, 2013, 09:02:10 PM: Hi Frank! Thank you for your response.

I'm sorry, I posted my italian code. I'll try to be clearer.
"Prova" means just "test", and the code
Code: [Select]
times 64-$+buffer db '_' is just a reserving mem space loop.
I suppose (correct me if i'm wrong) that
Code: [Select]
db ' ' would get the same effect, except for reserving empty space (I don't care for now).

The thing is that nasm did automatically convert this code into those
Code: [Select]
pop ediWhy did it used a pop code? I wonder how this program did not go into a seg fault. As we stated, it should have used push edi (or something alike). Have I to do something to instruct nasm not to use pop?

This program is just a test though, I just wanted to see how
Code: [Select]
times 64-$+buffer db '_' would have been converted in run-time debugging.

My main doubt is that I cannot get why not all the space between $eip+0 and $eip+63 (which is the reserved space of that instruction) is not filled with the same instruction (pop edi): the first 5 instructions aren't pop edi like the others.
And they're not result of my code.
The db is my first instruction (none of those five lines seem to be their run time equivalent, but I neither think db should turn into a run-time instruction), hence the "times" loop goes up until the resting code, which is
Code: [Select]
mov eax, 4 xor ebx, ebx inc ebx mov ecx, message mov edx, len int 80h mov eax, 1 mov ebx, 0 int 80h message db '\u263a' len equ $-message
consistent with gdb's one:

Code: [Select]
0x080480a0 <+64>: mov eax,0x4 0x080480a5 <+69>: xor ebx,ebx 0x080480a7 <+71>: inc ebx 0x080480a8 <+72>: mov ecx,0x80480c0 0x080480ad <+77>: mov edx,0x6 0x080480b2 <+82>: int 0x80 0x080480b4 <+84>: mov eax,0x1 0x080480b9 <+89>: mov ebx,0x0 0x080480be <+94>: int 0x80
So, basically, where
Code: [Select]
push eax push edx dec edi push esi inc ecxdo actually come from?

Again, thank you, have a nice day!
Title: Re: Strange Padding with times instruction
Post by: Bryant Keller on November 08, 2013, 10:29:59 PM: Quote from: neurophobos on November 08, 2013, 09:02:10 PM
Hi Frank! Thank you for your response.

I'm sorry, I posted my italian code. I'll try to be clearer.
"Prova" means just "test", and the code
Code: [Select]
times 64-$+buffer db '_' is just a reserving mem space loop.
I suppose (correct me if i'm wrong) that
Code: [Select]
db ' ' would get the same effect, except for reserving empty space (I don't care for now).

What Frank is saying, however, is that you put that data into the .text section, it should be put into a different section (like .data or .rodata). The way it is, it's being translated as instructions instead of separated into a different section.

Quote from: neurophobos on November 08, 2013, 09:02:10 PM
The thing is that nasm did automatically convert this code into those
Code: [Select]
pop ediWhy did it used a pop code?

Nasm, like all assemblers, take instructions and convert them into blocks of data (eg. Machine Code). By coincidence, the '_' character is ASCII code 0x5F which is also the same as the machine instruction code for "POP EDI".

Quote from: neurophobos on November 08, 2013, 09:02:10 PM
I wonder how this program did not go into a seg fault.

Because you didn't read enough from the stack to access invalid memory and you didn't attempt to use stack based control instructions like RET to cause you to change execution to protected memory. In other words, the "POP EDI"s shown are simply reading data on the stack that the OS has left there for you (Like command line arguments, environment variables, elf aux. table, etc.).

Quote from: neurophobos on November 08, 2013, 09:02:10 PM
As we stated, it should have used push edi (or something alike). Have I to do something to instruct nasm not to use pop?

You didn't tell it to put "PUSH EDI" you told it to put "_" in the code section, which (as I said before) is the same code as "POP EDI".

Quote from: neurophobos on November 08, 2013, 09:02:10 PM
My main doubt is that I cannot get why not all the space between $eip+0 and $eip+63 (which is the reserved space of that instruction) is not filled with the same instruction (pop edi): the first 5 instructions aren't pop edi like the others.
And they're not result of my code.

But they are a result of your code. The "PROVA" is also being inserted as machine code.

Code: [Select]
=>0x08048060 <+0>: push eax ; DB "P" 0x08048061 <+1>: push edx ; DB "R" 0x08048062 <+2>: dec edi ; DB "O" 0x08048063 <+3>: push esi ; DB "V" 0x08048064 <+4>: inc ecx ; DB "A" 0x08048065 <+5>: pop edi ; DB "_" 0x08048066 <+6>: pop edi ; DB "_" 0x08048067 <+7>: pop edi ; DB "_"
Quote from: neurophobos on November 08, 2013, 09:02:10 PM
The db is my first instruction (none of those five lines seem to be their run time equivalent, but I neither think db should turn into a run-time instruction), hence the "times" loop goes up until the resting code

DB is not an instruction, it's a directive. This directive tells the assembler to put the bytes that follow directly into the executable where they appear. NASM handles converting ASCII strings into byte sequences for you, so can use:

Code: [Select]
DB "PROVA"
instead of:

Code: [Select]
DB 0x50 DB 0x52 DB 0x49 DB 0x56 DB 0x41
or:

Code: [Select]
push eax push edx dec edi push esi inc ecx
Which are all the same thing after the assembler is done with it. :)
Title: Re: Strange Padding with times instruction
Post by: Frank Kotler on November 09, 2013, 03:12:32 AM: Here's another way of looking at it, coming at it from the other end, so to speak:

Code: [Select]
global _start section .text _start: nop mov edx, stuff_len mov ecx, stuff mov ebx, 1 mov eax, 4 int 80h mov eax, 1 xor ebx, ebx int 80h ; although this appears to be "code", it will not be executed (we've already exited) stuff: push eax push edx dec edi push esi inc ecx stuff_len equ $ - stuff
Everything in your file is "just bytes". If the CPU (or debugger) stumbles into 'em, they'll be interpreted as code, and executed (if possible). If they're printed, they'll be interpreted as ascii codes for characters. In another position (in a linkable object file) they might be interpreted as instructions to the linker, how to link it. If in an executable, they might be "code" or "data" or instructions to the loader. It's all "just bytes" - how they're interpreted depends on where they are.

An "old school" way of coding would look like:
Code: [Select]
jmp start db "hello world" db "other data" start: ; real "code" continues from here...
You could do this with "PROVA" and the underscores as an alternative to putting it in the ".data" section. We don't want this to be executed - didn't crash in this case, but you were just lucky.

I've got a file (can't lay my hands on it at the moment) which contains entirely "garbage code" - real instructions, but they don't make any sense. If assembled and viewed (not run), it looks like "HELLO WORLD". Merely a toy - not intended to be useful for anything.

This must seem pretty strange when you first encounter it, but you'll get used to it.

Best,
Frank
Title: Re: Strange Padding with times instruction
Post by: neurophobos on November 11, 2013, 11:49:09 AM: Hello my friends.
First of all thanks, your answers are very clear and complete.
I continue bothering you with other questions popped up reading this thread.
If I got it right,
Quote
DB it's not an instruction, it's a directive
means basically that it's not to be translated in opcode like others instructions (i.e. mov eax, 1 etc). This means that whatever i put in a DB directive, it will be stored in main memory as it is. In this case it's stored like instructions because where inside the .text segment, retrieving the instructions from the equivalent ascii values of what i store in.
My question here is where I can get a chart of equivalent opcodes/ASCII from? I suppose it could be possible to write an entire program with a single "db <nonsense string>" (here by "program" I mean a meaningless program with just opcodes in text segment). This chart would be of great interest.
My great doubt is that, based upon what i've read here, nasm interprets as instructions the equivalent ascii codes of a db statement like this one. I thought this happens because i define db 'bla bla' in .text segment, so nasm tries to convert in opcodes what it supposes to be just instructions and no data declarations.
So i dit a try, creating this:
Code: [Select]
global _start SECTION .text _start: db 'PROVAAAaW' mov eax, 1 mov ebx, 1 int 80h SECTION .data db 'PROVAAAAaW' mov eax, 1 xor ecx, ecx int 80hWith this, I wouldn't have expected to find the db statement in section .data following the .text code in gdb:
Code: [Select]
(gdb) disass Dump of assembler code for function _start: => 0x08048080 <+0>: push eax 0x08048081 <+1>: push edx 0x08048082 <+2>: dec edi 0x08048083 <+3>: push esi 0x08048084 <+4>: inc ecx 0x08048085 <+5>: inc ecx 0x08048086 <+6>: inc ecx 0x08048087 <+7>: popa 0x08048088 <+8>: push edi 0x08048089 <+9>: mov eax,0x1 0x0804808e <+14>: mov ebx,0x1 0x08048093 <+19>: int 0x80 End of assembler dump. (gdb) x/25i $eip => 0x8048080 <_start>: push eax 0x8048081 <_start+1>: push edx 0x8048082 <_start+2>: dec edi 0x8048083 <_start+3>: push esi 0x8048084 <_start+4>: inc ecx 0x8048085 <_start+5>: inc ecx 0x8048086 <_start+6>: inc ecx 0x8048087 <_start+7>: popa 0x8048088 <_start+8>: push edi 0x8048089 <_start+9>: mov eax,0x1 0x804808e <_start+14>: mov ebx,0x1 0x8048093 <_start+19>: int 0x80 0x8048095: add BYTE PTR [eax],al 0x8048097: add BYTE PTR [eax+0x52],dl 0x804809a: dec edi 0x804809b: push esi 0x804809c: inc ecx 0x804809d: inc ecx 0x804809e: inc ecx 0x804809f: inc ecx 0x80480a0: popa 0x80480a1: push edi 0x80480a2: mov eax,0x1 ---Type <return> to continue, or q <return> to quit--- 0x80480a7: xor ecx,ecx 0x80480a9: int 0x80 (gdb)No trace of that "db 'PROVAAAAaW'" as string in $esp (initialized data should be in stack segment right?).
As you can see after $eip+19, there are again those fake instructions, but the first two are being translated differently.
In .text section
Code: [Select]
0x08048080 <+0>: push eax 0x08048081 <+1>: push edx
In .data section:
Code: [Select]
0x8048095: add BYTE PTR [eax],al 0x8048097: add BYTE PTR [eax+0x52],dl
I thought it happens because I do not store an actual variable, so I tried a second version, with this code:

Code: [Select]
global _start SECTION .text _start: db 'PROVAAAaW' mov eax, 1 mov ebx, 1 int 80h SECTION .data aw db 'PROVAAAAaW' mov eax, 1 xor ecx, ecx int 80hbut I get the same result, no trace of "PROVAAAAaW" as string, nor aw (except for its normal presence in symbol table of elf object file).
Code: [Select]
utente@utente-virtual-machine:~/Scrivania/programmazione/nasm$ readelf -s prova2 Symbol table '.symtab' contains 10 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 08048080 0 SECTION LOCAL DEFAULT 1 2: 08049098 0 SECTION LOCAL DEFAULT 2 3: 00000000 0 FILE LOCAL DEFAULT ABS prova2.asm 4: 08049098 0 NOTYPE LOCAL DEFAULT 2 aw 5: 00000000 0 FILE LOCAL DEFAULT ABS 6: 08048080 0 NOTYPE GLOBAL DEFAULT 1 _start 7: 080490ab 0 NOTYPE GLOBAL DEFAULT 2 __bss_start 8: 080490ab 0 NOTYPE GLOBAL DEFAULT 2 _edata 9: 080490ac 0 NOTYPE GLOBAL DEFAULT 2 _end utente@utente-virtual-machine:~/Scrivania/programmazione/nasm$
Why does this happen? Again thank you for your patience, I'm trying to get the functioning of nasm itself rather than actually programming something.
Best,
Neuro
Title: Re: Strange Padding with times instruction
Post by: Bryant Keller on November 11, 2013, 05:46:42 PM: Quote from: neurophobos on November 11, 2013, 11:49:09 AM
My question here is where I can get a chart of equivalent opcodes/ASCII from?

There isn't one. The ASCII code chart is available online from a variety of sources, such as this one (http://www.asciitable.com/index/asciifull.gif). Opcodes, however, aren't as easy as ASCII. If you take a look at the x86 Opcode Table (http://sparksandflames.com/files/x86InstructionChart.html) You'll notice that not all instructions are just one byte, many of them can be much larger (up to even 15 bytes including prefixes under IA16).

Quote from: neurophobos on November 11, 2013, 11:49:09 AM
I suppose it could be possible to write an entire program with a single "db <nonsense string>" (here by "program" I mean a meaningless program with just opcodes in text segment).

Yes, it's possible, but not really recommended.

Quote from: neurophobos on November 11, 2013, 11:49:09 AM
My great doubt is that, based upon what i've read here, nasm interprets as instructions the equivalent ascii codes of a db statement like this one. I thought this happens because i define db 'bla bla' in .text segment, so nasm tries to convert in opcodes what it supposes to be just instructions and no data declarations.

When NASM see's DB, it just converts to bytes. It doesn't care about codes, it just converts to bytes. When NASM see's a quoted string, it converts those into a list of ASCII bytes which then get passed to DB.

Quote from: neurophobos on November 11, 2013, 11:49:09 AM
So i dit a try, creating this:
Code: [Select]
global _start SECTION .text _start: db 'PROVAAAaW' mov eax, 1 mov ebx, 1 int 80h SECTION .data db 'PROVAAAAaW' mov eax, 1 xor ecx, ecx int 80h

I have no idea why you did that. You should have probably done something like:

Code: [Select]
global _start SECTION .text _start: mov eax, 1 mov ebx, 1 int 80h SECTION .data VAR1: db 'PROVAAAaW'
As you can see, I didn't add any DB stuff to the .text section. With this code, GDB won't fubar your output. Also, the reason I added a label "VAR1" is so that in GDB (as long as you gave both NASM and GCC the '-g' option) you can use:
Code: [Select]
printf "%s\n", &VAR1 to display the contents of that variable.

Quote from: neurophobos on November 11, 2013, 11:49:09 AM
With this, I wouldn't have expected to find the db statement in section .data following the .text code in gdb:

Well, who said that your .data is after your .text? The reason you put things in sections is to group data together. The Linker gets to decide where that data ends up, it might be after your .text, might be before, unless you specifically write your own LD scripts (which is beyond the scope of this forum), you don't really have a say in where things end up, the linker gets to do that.

Quote from: neurophobos on November 11, 2013, 11:49:09 AM
Code: [Select]
(gdb) disass Dump of assembler code for function _start: => 0x08048080 <+0>: push eax 0x08048081 <+1>: push edx 0x08048082 <+2>: dec edi 0x08048083 <+3>: push esi 0x08048084 <+4>: inc ecx 0x08048085 <+5>: inc ecx 0x08048086 <+6>: inc ecx 0x08048087 <+7>: popa 0x08048088 <+8>: push edi 0x08048089 <+9>: mov eax,0x1 0x0804808e <+14>: mov ebx,0x1 0x08048093 <+19>: int 0x80 End of assembler dump. (gdb) x/25i $eip => 0x8048080 <_start>: push eax 0x8048081 <_start+1>: push edx 0x8048082 <_start+2>: dec edi 0x8048083 <_start+3>: push esi 0x8048084 <_start+4>: inc ecx 0x8048085 <_start+5>: inc ecx 0x8048086 <_start+6>: inc ecx 0x8048087 <_start+7>: popa 0x8048088 <_start+8>: push edi 0x8048089 <_start+9>: mov eax,0x1 0x804808e <_start+14>: mov ebx,0x1 0x8048093 <_start+19>: int 0x80 0x8048095: add BYTE PTR [eax],al 0x8048097: add BYTE PTR [eax+0x52],dl 0x804809a: dec edi 0x804809b: push esi 0x804809c: inc ecx 0x804809d: inc ecx 0x804809e: inc ecx 0x804809f: inc ecx 0x80480a0: popa 0x80480a1: push edi 0x80480a2: mov eax,0x1 ---Type <return> to continue, or q <return> to quit--- 0x80480a7: xor ecx,ecx 0x80480a9: int 0x80 (gdb)No trace of that "db 'PROVAAAAaW'" as string in $esp (initialized data should be in stack segment right?).

No, uninitialized data goes into the stack segment. Initialize data is found at a set offset from the executable's starting point in memory. This location will be object format dependent and the location of your executable in memory changes each time it's executed (usually).
Title: Re: Strange Padding with times instruction
Post by: Frank Kotler on November 12, 2013, 01:55:41 AM: For one thing, you're looking at the output from gdb and blaming it on Nasm. I assure you that Nasm put your string in your code, but you didn't ask gdb to display it as a string, you asked gdb to disassemble it... so it did. I don't know if there's a way to ask gdb to display ascii text or not - I imagine there is. I'm not very good with gdb!

There used to be a Unix utility called "dump" but it does not exist on my system, and "apt-get" is being "apt-don't-get". I also wrote my own version, but apparently that got left behind on my "old" machine. I'll have to go back and fetch it, or write it over. Wasn't very difficult, but I'm not in the mood right now. I think seeing "just the bytes" - which is what Nasm produces - would help you understand what's going on here. I'll get back to ya on this "if the spirit moves me". Hang in there!

Best,
Frank
Title: Re: Strange Padding with times instruction
Post by: neurophobos on November 12, 2013, 03:13:16 PM: So, everything I write quoted is translated in ASCII.
While in memory there are just bytes, it's being displayed in gdb as instructions because I'm disassembling x/i, so gdb tries to (and not nasm actually did) convert those ascii bytes interpreting what they could mean (I suppose gdb has an array of symbols to read from to map opcodes it displays disassembling programs).
It's a coincidence that those single characters are all equivalent to unary operations

Quote
50-> PUSH eax   ;P
52-> PUSH edx   ;R
4F-> DEC eDI   ;O
56-> PUSH eSI   ;V
61-> POPA   ;a
57-> PUSH edi   ;w

otherwise I'd have found incomplete binary opcodes (or complete since gdb may pick up more bytes to make a sensed interpretation?).
DD is a directive as it stores what I tell it to store as it is. There's no conversion in opcode because it's not an operation. For the operations, nasm processes them and converts them in their relative hex values.
I bet that the loader understands what is instruction and what operand (since the hex values combos may coincide or overlap) because a protocol of conversion/fetching is followed.

You've been very precious and kind, I thank you all very much for your explanations.
Neuro