First program, segmentation fault (cat, linux)

NASM Forum > Programming with NASM

(1/4) > >>

XenoReseller:
This was my first shot at assembly after reading a couple of pages from various sources, but I'm getting a segmentation fault....For all I know my code could be COMPLETELY screwed up. Any help?
--- Code: ---SECTION .bss ; mutable/modifiable variables
msg: resb 1

SECTION .text
global _start

_start:
call read
call write
mov eax, msg
cmp eax, 10
jne _start
mov eax, 1
int 80h

read:
mov eax, 3
push 1
mov ebx, [msg]
push ebx
push 0
int 80h
ret

write:
mov eax, 4
push 1
mov ebx, [msg]
push ebx
push 1
int 80h
ret
--- End code ---

I got it working without using push:

--- Code: ---SECTION .bss ; mutable/modifiable variables
msg: resb 1

SECTION .text
global _start
_start:
call read
call write
mov eax, msg
cmp eax, 10
jne _start
mov eax, 1
int 80h

read:
mov eax, 3
mov ebx, 0
mov ecx, msg
mov edx, 1
int 80h
ret

write:
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx, 1
int 80h
ret
--- End code ---

Frank Kotler:
No, your code isn't COMPLETELY screwed up. The "exit" is right. :)

It does have some problems!

--- Code: ---SECTION .bss ; mutable/modifiable variables
msg: resb 1

SECTION .text
global _start

_start:
call read
call write
mov eax, msg

--- End code ---

This puts the address of msg into eax. It's 0x8049000-and-some - never 10. You want "[msg]", ("[contents]" of memory) but I'm not sure that'll do what you want, either...

--- Code: --- cmp eax, 10
jne _start
mov eax, 1
int 80h

--- End code ---

Okay, but now you go into a weird mix of C and sys_call calling conventions...

--- Code: --- read:
mov eax, 3
push 1
mov ebx, [msg]
push ebx
push 0
int 80h
ret

--- End code ---

Here's where your segfault comes from. It is not obvious to beginners, but "call" and "ret" use the stack. "call read" put the return address (the address right after this instruction) on the stack. When you get to "ret", you return to the address on the stack. Since you pushed 0... program go "boom". :)

The way a "read" should go is more like...

--- Code: ---read:
mov eax, 3 ; system call number for "read"
mov ebx, 0 ; file descriptor to read from 0=stdin
mov ecx, msg ; address of our buffer
mov edx, 1 ; how many (max) bytes to read
int 80h ; call kernel
ret ; since we didn't push anything, this is okay

--- End code ---

What really happens here is that sys_read on stdin doesn't return until you hit a linefeed (the 10 you're looking for). If the user types "abc(enter)", the system call doesn't return until we hit "enter", but since we only asked for one character, the "a" goes into the buffer (msg), the "bc(enter)" stays in the OS's input buffer. If you exited right now, you'd execute the program "bc" (some sort of calculator, apparently). Since you go back and "call read" again, it'll get "b" the next time, then "c", then the 10 you're looking for. So it'll work, but doesn't really do what it appears to.

There's a way to make sys_read return without waiting for the "enter", but it's complicated, and I'm not satisfied with the way I do it. :(

When you look for the 10, you do:

--- Code: ---mov eax, [msg]

--- End code ---
well... you put the address, not the contents, but... using eax gets the contents of msg - and the next three bytes! Chances are, the next three bytes are 0s in this case, so it'll probably work, but you probably want to use al to get just the one byte you've got in the buffer:

--- Code: ---mov al, [msg]
cmp al, 10
...

--- End code ---

You've got a similar problem in "write"...

--- Code: --- write:
mov eax, 4
push 1
mov ebx, [msg]
push ebx
push 1
int 80h
ret

--- End code ---

This, too, would segfault if you got to it. More like...

--- Code: ---write:
mov eax, 4 ; sys_call number for "sys_write"
mov ebx, 1 ; file descriptor 1=stdout
mov ecx, msg ; address of buffer
mov edx, 1 ; how many?
int 80h
ret

--- End code ---

See if that helps. There are some tutorials on the subject here:

http://asm.sourceforge.net/resources.html#tutorials

Best,
Frank

Frank Kotler:
Ah! I should have waited for the edit! :)

Okay, you figured it out. Still better off with:

--- Code: ---mov al, [msg] ; "[contents]" of memory - just a byte
cmp al, 10
jne _start
...

--- End code ---

Best,
Frank

XenoReseller:

--- Quote from: Frank Kotler on January 16, 2012, 03:41:11 AM ---Ah! I should have waited for the edit! :)

Okay, you figured it out. Still better off with:

--- Code: ---mov al, [msg] ; "[contents]" of memory - just a byte
cmp al, 10
jne _start
...

--- End code ---

Best,
Frank

--- End quote ---
No, I'm glad you went through the original code. Now I know that the stack plays a key role with calling/returning. The comparison was still eluding me though! Thanks very much!

Another question, it appears that registers are used for syscalls, correct? What else is the stack used for?

Frank Kotler:
Correct, syscalls use registers - in Linux. If you were using BSD, parameters would be passed on the stack... and we'd call the int 80h kinda like...

--- Code: ---push 1 ; length
push msg
push 1 ; stdin
mov eax, 4 ; sys_write
call kernel_call
add esp, 12 ; "remove" parameters from stack
...

kernel_call:
int 80h
ret

--- End code ---
BSD's int 80h doesn't actually need the return address to be there - but it expects the parameters to be in a position on the stack as if it were. We can eliminate the call-and-return...

--- Code: ---push 1 ; length
push msg
push 1 ; stdin
mov eax, 4 ; sys_write
push eax ; or any "dummy" value
int 80h
add esp, 16 ; "remove" parameters from stack

--- End code ---
I've never run BSD, but I'm "pretty sure" that works.

If you were calling a C function, parameters would be passed on the stack:

--- Code: ---; nasm -f elf32 hwc.asm
; gcc hwc.o -o hwc -m32
; (only need the "-m32" on 64-bit systems)
; ./hwc

global main
extern printf

section .data
fmtstr db 'Hello, World!',10,0
section .text
main:

pushad

push dword fmtstr
call printf
add esp,4

popad

ret

--- End code ---

If we were using Windows, the Windows API expects parameters on the stack, but there's a subtle difference. Windows APIs are "stdcall", in which "callee cleans up stack". We wouldn't need the "add esp, ?" after the call - the API (which we don't normally see code for) ends in "ret ?" instead of a plain "ret", which "removes" the parameters for us. Quite a convenient calling convention, actually. The called function needs to know how many parameters were passed - won't work for something like "printf" which can take a variable number of parameters!

We can also use the stack for "local" (or "stack" or "automatic") variables...

--- Code: ---my_thing:
; set up a "stack frame"
push ebp ; save caller's ebp
mov ebp, esp ; "frame pointer"
sub esp, 4 ; room for a single local variable
; initialize the local variable
mov dword [ebp - 4], 42
; do some stuff...
mov eax, [ebp - 4] ; "return 42" (return value goes in eax)
; destroy the stack frame - this "frees" memory used for our local variable
mov esp, ebp
pop ebp ; restore caller's ebp
ret

--- End code ---
(we need to be careful not to alter ebp while all this is going on!) Note that passed parameters (if any) are at ebp + 8, 12, 16, etc. and local variables are at ebp - 4, etc.

There's also push and pop, but you knew that. I think that's everything the stack is ordinarily used for...

Best,
Frank

Navigation

[0] Message Index

[#] Next page

Go to full version