Author Topic: First program, segmentation fault (cat, linux)  (Read 23805 times)

Offline XenoReseller

  • Jr. Member
  • *
  • Posts: 7
First program, segmentation fault (cat, linux)
« on: January 16, 2012, 01:44:47 AM »
This was my first shot at assembly after reading a couple of pages from various sources, but I'm getting a segmentation fault....For all I know my code could be COMPLETELY screwed up. Any help?
Code: [Select]
SECTION .bss  ; mutable/modifiable variables
msg: resb 1


SECTION .text
global _start

_start:
call read
call write
mov eax, msg
cmp eax, 10
jne _start
mov eax, 1
int 80h

read:
mov eax, 3
push 1
mov ebx, [msg]
push ebx
push 0
int 80h
ret

write:
mov eax, 4
push 1
mov ebx, [msg]
push ebx
push 1
int 80h
ret

I got it working without using push:

Code: [Select]
SECTION .bss  ; mutable/modifiable variables
msg: resb 1


SECTION .text
global _start
_start:
call read
call write
mov eax, msg
cmp eax, 10
jne _start
mov eax, 1
int 80h

read:
mov eax, 3
mov ebx, 0
mov ecx, msg
mov edx, 1
int 80h
ret

write:
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx, 1
int 80h
ret
« Last Edit: January 16, 2012, 03:20:44 AM by XenoReseller »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: First program, segmentation fault (cat, linux)
« Reply #1 on: January 16, 2012, 03:33:30 AM »
No, your code isn't COMPLETELY screwed up. The "exit" is right. :)

It does have some problems!

Code: [Select]
SECTION .bss  ; mutable/modifiable variables
msg: resb 1


SECTION .text
global _start

_start:
call read
call write
mov eax, msg

This puts the address of msg into eax. It's 0x8049000-and-some - never 10. You want "[msg]", ("[contents]" of memory) but I'm not sure that'll do what you want, either...

Code: [Select]
cmp eax, 10
jne _start
mov eax, 1
int 80h

Okay, but now you go into a weird mix of C and sys_call calling conventions...

Code: [Select]
read:
mov eax, 3
push 1
mov ebx, [msg]
push ebx
push 0
int 80h
ret

Here's where your segfault comes from. It is not obvious to beginners, but "call" and "ret" use the stack. "call read" put the return address (the address right after this instruction) on the stack. When you get to "ret", you return to the address on the stack. Since you pushed 0... program go "boom". :)

The way a "read" should go is more like...

Code: [Select]
read:
    mov eax, 3 ; system call number for "read"
    mov ebx, 0 ; file descriptor to read from 0=stdin
    mov ecx, msg ; address of our buffer
    mov edx, 1 ; how many (max) bytes to read
    int 80h ; call kernel
    ret ; since we didn't push anything, this is okay

What really happens here is that sys_read on stdin doesn't return until you hit a linefeed (the 10 you're looking for). If the user types "abc(enter)", the system call doesn't return until we hit "enter", but since we only asked for one character, the "a" goes into the buffer (msg), the "bc(enter)" stays in the OS's input buffer. If you exited right now, you'd execute the program "bc" (some sort of calculator, apparently). Since you go back and "call read" again, it'll get "b" the next time, then "c", then the 10 you're looking for. So it'll work, but doesn't really do what it appears to.

There's a way to make sys_read return without waiting for the "enter", but it's complicated, and I'm not satisfied with the way I do it. :(

When you look for the 10, you do:
Code: [Select]
mov eax, [msg]
well... you put the address, not the contents, but... using eax gets the contents of msg - and the next three bytes! Chances are, the next three bytes are 0s in this case, so it'll probably work, but you probably want to use al to get just the one byte you've got in the buffer:
Code: [Select]
mov al, [msg]
cmp al, 10
...

You've got a similar problem in "write"...

Code: [Select]
write:
mov eax, 4
push 1
mov ebx, [msg]
push ebx
push 1
int 80h
ret

This, too, would segfault if you got to it. More like...
Code: [Select]
write:
    mov eax, 4 ; sys_call number for "sys_write"
    mov ebx, 1 ; file descriptor 1=stdout
    mov ecx, msg ; address of buffer
    mov edx, 1 ; how many?
    int 80h
    ret

See if that helps. There are some tutorials on the subject here:

http://asm.sourceforge.net/resources.html#tutorials

Best,
Frank


Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: First program, segmentation fault (cat, linux)
« Reply #2 on: January 16, 2012, 03:41:11 AM »
Ah! I should have waited for the edit! :)

Okay, you figured it out. Still better off with:
Code: [Select]
mov al, [msg] ; "[contents]" of memory - just a byte
cmp al, 10
jne _start
...

Best,
Frank


Offline XenoReseller

  • Jr. Member
  • *
  • Posts: 7
Re: First program, segmentation fault (cat, linux)
« Reply #3 on: January 16, 2012, 03:52:55 AM »
Ah! I should have waited for the edit! :)

Okay, you figured it out. Still better off with:
Code: [Select]
mov al, [msg] ; "[contents]" of memory - just a byte
cmp al, 10
jne _start
...

Best,
Frank
No, I'm glad you went through the original code. Now I know that the stack plays a key role with  calling/returning. The comparison was still eluding me though! Thanks very much!

Another question, it appears that registers are used for syscalls, correct? What else is the stack used for?
« Last Edit: January 16, 2012, 03:57:54 AM by XenoReseller »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: First program, segmentation fault (cat, linux)
« Reply #4 on: January 16, 2012, 01:31:23 PM »
Correct, syscalls use registers - in Linux. If you were using BSD, parameters would be passed on the stack... and we'd call the int 80h kinda like...

Code: [Select]
push 1 ; length
push msg
push 1 ; stdin
mov eax, 4 ; sys_write
call kernel_call
add esp, 12 ; "remove" parameters from stack
...

kernel_call:
int 80h
ret
BSD's int 80h doesn't actually need the return address to be there - but it expects the parameters to be in a position on the stack as if it were. We can eliminate the call-and-return...

Code: [Select]
push 1 ; length
push msg
push 1 ; stdin
mov eax, 4 ; sys_write
push eax ; or any "dummy" value
int 80h
add esp, 16 ; "remove" parameters from stack
I've never run BSD, but I'm "pretty sure" that works.

If you were calling a C function, parameters would be passed on the stack:

Code: [Select]
; nasm -f elf32 hwc.asm
; gcc hwc.o -o hwc -m32
; (only need the "-m32" on 64-bit systems)
; ./hwc

global main
extern printf

section .data
    fmtstr db 'Hello, World!',10,0
section .text
    main:
   
    pushad
   
    push dword fmtstr
    call printf
    add esp,4
   
    popad
   
    ret
   

If we were using Windows, the Windows API expects parameters on the stack, but there's a subtle difference. Windows APIs are "stdcall", in which "callee cleans up stack". We wouldn't need the "add esp, ?" after the call - the API (which we don't normally see code for) ends in "ret ?" instead of a plain "ret", which "removes" the parameters for us. Quite a convenient calling convention, actually. The called function needs to know how many parameters were passed - won't work for something like "printf" which can take a variable number of parameters!

We can also use the stack for "local" (or "stack" or "automatic") variables...

Code: [Select]
my_thing:
; set up a "stack frame"
    push ebp ; save caller's ebp
    mov ebp, esp ; "frame pointer"
    sub esp, 4 ; room for a single local variable
; initialize the local variable
    mov dword [ebp - 4], 42
; do some stuff...
    mov eax, [ebp - 4] ; "return 42" (return value goes in eax)
; destroy the stack frame - this "frees" memory used for our local variable
    mov esp, ebp
    pop ebp ; restore caller's ebp
    ret
(we need to be careful not to alter ebp while all this is going on!) Note that passed parameters (if any) are at ebp + 8, 12, 16, etc. and local variables are at ebp - 4, etc.

There's also push and pop, but you knew that. I think that's everything the stack is ordinarily used for...

Best,
Frank


Offline XenoReseller

  • Jr. Member
  • *
  • Posts: 7
Re: First program, segmentation fault (cat, linux)
« Reply #5 on: January 16, 2012, 05:59:05 PM »
Thanks, just one more question... I'm trying to create my own stack using a 256-byte variable. I have a 1-byte variable for the offset and each element is 4 bytes long so I increment the offset by 4. I then try:

Code: [Select]
mov eax, stack ; move stack address into eax
add eax, [offset] ; add offset address to eax

At this point I'm confused as to how to access the address in eax. [var] is the contents while var is the address. But registers don't have the same addresses so what does [eax] and eax mean?
« Last Edit: January 16, 2012, 06:14:53 PM by XenoReseller »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: First program, segmentation fault (cat, linux)
« Reply #6 on: January 16, 2012, 08:31:46 PM »
I'm not sure I understand the question, but I think you've got a "problem" here...

Code: [Select]
mov eax, stack ; move stack address into eax
add eax, [offset] ; add offset address to eax

If I understand you, "offset" is a one-byte variable. But adding to eax will use four bytes (32 bits) - your one byte of "offset" plus the following three bytes (which may be zeros... if we're lucky). I would suggest using a dword for "offset", even if your intended largest value is 255. Ummm... maybe something like...

Code: [Select]
section .data
    offset dd 0
section .bss
    stack resd 1024 ; 256 * 4
section .text
    mov eax, stack
    add eax, [offset]
; now put something on the "stack"
    mov dword [eax], 42
    add dword [offset], 4

; do other things

; now get our value back off the "stack"
    mov eax, stack
    add eax, [offset]
    mov ebx, [eax]
    sub dword [offset], 4

Now we should have 42 in ebx, and the "stack" (plus its "offset") should be back where it was when we started. In my implementation, the "stack" works upwards to higher memory, whereas the real "stack" works downward from the top of memory. This may not be what you have in mind. Maybe a more concrete, or more complete example of what you're trying to do would clarify my mind (or maybe not... :) )

In any case, "[eax]" would refer to the contents of memory at the address held in eax. Note that in some cases, we need to specify the size of the operation - when Nasm can't tell by the size of the register we're using...

I don't know if that really answers your question or not. Don't hesitate to ask for clarification!

Best,
Frank


Offline XenoReseller

  • Jr. Member
  • *
  • Posts: 7
Re: First program, segmentation fault (cat, linux)
« Reply #7 on: January 17, 2012, 01:53:04 AM »
Well, here was my broken(really borked) stack at the time I wrote that question:

tes: was just a label I was using to test the stack
Depending on what I changed it too, I'd have segfaults...Other times, it would just print out 3 of the same number that wasn't any of the ones that it should have been.

Also, I noticed you did add dword [offset], 4 even though it was declared in the .data section? I thought variables declared there were constants?

I really appreciate this, you've been very helpful.
Code: [Select]
%define sys_write 4

section .bss
stack: resb 256
offset: resb 1
value: resb 4

section .text
global _start

_start:
call tes
mov eax, 1
int 80h

psh:
mov eax, stack
mov eax, [offset]
mov eax, [value]
add byte[offset], 0x04
ret
pp:
sub byte[offset], 0x04
mov eax, stack
add eax, [offset]
mov ebx, value
mov [ebx], eax
ret
write:
mov eax, sys_write
mov ebx, 1
mov ecx, [value]
mov edx, 1
int 80h
ret
tes:
mov dword[value], '1'
call psh
mov dword[value], '2'
call psh
mov dword[value], '3'
call psh
call pp
call write
call pp
call write
call pp
call write
ret
« Last Edit: January 17, 2012, 01:56:00 AM by XenoReseller »

Offline Mathi

  • Jr. Member
  • *
  • Posts: 82
  • Country: in
    • Win32NASM
Re: First program, segmentation fault (cat, linux)
« Reply #8 on: January 17, 2012, 03:42:51 AM »
I corrected some problems in your routine (not tested).

Quote
Also, I noticed you did add dword [offset], 4 even though it was declared in the .data section? I thought variables declared there were constants?

No they are NOT constants. (you can read/write to the memory address offset)

It is better to initialize content at memory address 'offset' to zero. (bss => uninitialized variable declaration)

Looking at the the number of bytes you reserved, you should be able to push and pop 64 values (256/4).

It is just a matter of time for you to grasp the difference between memory and memory contents  :)

mov eax, ebx   ;; copy value in ebx register to eax register
mov [eax], ebx ;; copy value in ebx register to memory location pointed by eax.. In this case eax is assumed to have a memory address as its value(an address to which the program has access to read/write. otherwise it will result in seg fault).

When you use square brackets in your instruction , you are trying to access memory.

** Except for LEA instruction. (Load effective address doesn't deal with Memory contents).

It is better to specify the size of data you are copying/dealing with (byte, word, dword) when you use []

like ,

add eax, byte [offset]   ;; add 1 byte data from memory address offset to eax
or
add eax, word [offset]
or
add eax, dword [offset] 

depending on your intent.


Code: [Select]
%define sys_write 4

section .bss
stack: resb 256
offset: resb 1
value: resb 4

section .text
global _start

_start:
mov byte[offset],0 ;;; better initialize to 0
call tes
mov eax, 1
int 80h

psh:
mov eax, stack
add eax, byte [offset]
mov ebx, dword [value]
mov [eax],ebx
add byte[offset], 0x04
ret
pp:
sub byte[offset], 0x04
mov eax, stack
add eax, byte [offset]
mov ebx, [eax]
mov dword [value],ebx
ret
write:
mov eax, sys_write
mov ebx, 1
mov ecx, [value]
mov edx, 1
int 80h
ret
tes:
mov dword[value], '1'
call psh
mov dword[value], '2'
call psh
mov dword[value], '3'
call psh
call pp
call write
call pp
call write
call pp
call write
ret

All the best.

Thanks,
Mathi.
« Last Edit: January 17, 2012, 03:53:27 AM by Mathi »

Offline XenoReseller

  • Jr. Member
  • *
  • Posts: 7
Re: First program, segmentation fault (cat, linux)
« Reply #9 on: January 17, 2012, 04:26:34 AM »
Thanks, I appreciate the input. I understand the theory, but I don't actually know much syntax. So I can figure out these things in my head, but I forget what syntax does what(ie [eax] vs. eax) And in most cases I don't even know half these things exist.

I'll take a look at the code. Thanks very much. After modifying your code to fix mismatch operand size errors with adds it returns with a segmentation fault after compilation.

Here is the code with my comments to display my logic, it seems sound...but obviously this is where theory starts to matter less
Code: [Select]
%define sys_write 4
section .bss
stack: resb 256 ;256 byte stack, room for 64, 4-byte elements
offset: resb 1 ;1-byte offset
value: resb 4 ;4-byte element

section .text
global _start

_start:
mov byte[offset],0 ;;; better initialize to 0
call tes ;test the routines
mov eax, 1 ;exit syscall
mov ebx, 0 ;0, no error
int 80h ; kernel interrupt

psh: ;push routine
mov eax, stack ;eax now contains the stack address
add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)

mov ebx, dword [value] ;ebx contains the current value to push
mov [eax],ebx ;value(address specified in eax) now contains the value that was to be pushed
add byte [offset], 4 ;increment offset for next element
ret
pp: ;pop routine
sub byte [offset], 4 ;offset needs to be decremented to access last written element

mov eax, stack ;eax now contains the stack address
add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)

mov ebx, dword [eax] ;ebx contains the value to pop
mov dword [value],ebx ;value now contains the 'popped' value
ret
write:
mov eax, sys_write ;write syscall
mov ebx, 1 ;stdout
mov ecx, [value] ;value to write
mov edx, 1 ;single byte for testing
int 80h ; kernel interrupt
ret
tes: ;test
mov dword[value], '1' ;value to push
call psh ;push it
mov dword[value], '2'
call psh
mov dword[value], '3'
call psh
call pp ;pop last value
call write ;write it
call pp
call write
call pp
call write
ret
« Last Edit: January 17, 2012, 05:07:52 AM by XenoReseller »

Offline Mathi

  • Jr. Member
  • *
  • Posts: 82
  • Country: in
    • Win32NASM
Re: First program, segmentation fault (cat, linux)
« Reply #10 on: January 17, 2012, 07:17:41 AM »
Quote
add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)

Shouldn't this be,

add eax, byte [offset] ;eax contains address of current stack item (stack address + offset)

Since you have reserved only 1 byte for offset.  (offset resb 1)

(in both push and pop routines).


Offline XenoReseller

  • Jr. Member
  • *
  • Posts: 7
Re: First program, segmentation fault (cat, linux)
« Reply #11 on: January 17, 2012, 07:42:35 AM »
Quote
add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)

Shouldn't this be,

add eax, byte [offset] ;eax contains address of current stack item (stack address + offset)

Since you have reserved only 1 byte for offset.  (offset resb 1)

(in both push and pop routines).
It reports mismatch in operand sizes. I assumed if I declared an operation larger than the variable it would be zero-extended.

Offline Mathi

  • Jr. Member
  • *
  • Posts: 82
  • Country: in
    • Win32NASM
Re: First program, segmentation fault (cat, linux)
« Reply #12 on: January 17, 2012, 08:50:34 AM »
My bad..., the operand combination is invalid.
Still the addition of a byte from [offset] to eax can be done in a few statements.

mov ecx,0
mov cl,byte [offset]
add eax,ecx

instead of,  add eax, byte [offset]

Same holds good for SUB instruction also.

Alternatively, you can reserve DWORD for offset :)
and use dword [offset]  everywhere in your program.


Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: First program, segmentation fault (cat, linux)
« Reply #13 on: January 18, 2012, 01:40:20 AM »
"add eax, byte [offset]" is what we want, allright, but there's no such instruction. In general, both operands have to be the same size. There are exceptions. "movzx", for example, will zero-extend a byte or word (movsx sign-extends it). Pity we don't have an "addzx"! But wait... suppose we did "movzx eax, byte [offset]" first, and then "add eax, stack"?


Code: [Select]
%define sys_write 4
section .bss
stack: resb 256 ;256 byte stack, room for 64, 4-byte elements
offset: resb 1 ;1-byte offset
value: resb 4 ;4-byte element

section .text
global _start

_start:
mov byte[offset],0 ;;; better initialize to 0
call tes ;test the routines
mov eax, 1 ;exit syscall
mov ebx, 0 ;0, no error
int 80h ; kernel interrupt

psh: ;push routine
; mov eax, stack ;eax now contains the stack address
; add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)

    movzx eax, byte [offset]
    add eax, stack

mov ebx, dword [value] ;ebx contains the current value to push
mov [eax],ebx ;value(address specified in eax) now contains the value that was to be pushed
add byte [offset], 4 ;increment offset for next element
ret
pp: ;pop routine
sub byte [offset], 4 ;offset needs to be decremented to access last written element

; mov eax, stack ;eax now contains the stack address
; add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)

    movzx eax, byte [offset]
    add eax, stack

mov ebx, dword [eax] ;ebx contains the value to pop
mov dword [value],ebx ;value now contains the 'popped' value
ret
write:
mov eax, sys_write ;write syscall
mov ebx, 1 ;stdout
; mov ecx, [value] ;value to write
    mov ecx, value ; address(!) of our buffer
mov edx, 1 ;single byte for testing
int 80h ; kernel interrupt
ret
tes: ;test
mov dword[value], '1' ;value to push
call psh ;push it
mov dword[value], '2'
call psh
mov dword[value], '3'
call psh
call pp ;pop last value
call write ;write it
call pp
call write
call pp
call write
ret

I had to make one other small change... sys_write wants the address in ecx, you were putting the "[contents]" in ecx. This appears to work. I can see future problems, in that we don't check for "stack" overflow (or underflow), but it gets around the size-mismatch problem. Easier to make offset a dword, I suspect...

I think I'd take a slightly different approach. I think, instead of keeping an "offset", I'd keep a "stackpointer" (definitely want to be a dword), with the stack address and the offset already added, and keep this up-to-date...

Code: [Select]
    mov eax, [stackpointer]
    add eax, 4 ; or sub
    mov ebx, [eax] ; or vice-versa
    mov [stackpointer], eax
...

Oh... constants! I never use it, so I'd have to look up whether it's ".rdata" or ".rodata" that Nasm uses for a "constant" (unwriteable) data section. You could put constants in "section .text" with the same result (be careful not to put 'em where they'll be executed!). I "just don't" write to stuff in .data or .bss if I want it to be "constant". :)

.bss is nominally "uninitialized". This is literally true in an old dos .com file. Every other executable format I know of initializes .bss to zeros. I'm not sure we're supposed to count on it, but it is...

Best,
Frank




Offline XenoReseller

  • Jr. Member
  • *
  • Posts: 7
Re: First program, segmentation fault (cat, linux)
« Reply #14 on: January 18, 2012, 07:56:15 AM »
Thanks! I have another piece of code I was starting, printing all of it's parameters...Here is what I have:

Code: [Select]
SECTION .data
newline: db 10
SECTION .bss
argc: resb 1
param: resb 4
SECTION .text
global _start
_start:
pop eax
mov byte[argc], al
pop eax
run:
pop eax
mov dword[param], eax
sub byte[argc], 1

mov eax, 4
mov ebx, 1
mov ecx, dword[param]
mov edx, 4
int 80h

mov al, 00
cmp byte[argc], al

je exit
jmp run
exit:
mov eax, 4 ;sys_write
mov ebx, 1
mov ecx, newline
mov edx, 1
int 80h

mov eax, 1 ;sys_exit
mov ebx, 0
int 80h

Lets assume the goal was to print up to 4-byte parameters with no spaces...Did I do it in a "suitable" manner? If not what would you change? Or is there just a super easy way to print all the parameters like echo?(What I'm building up to)

Also. On my system, if I put 'FF' It'll print FFE or 'F' will print 'FSE'. Something wrong with me not cleaning out the memory...but I'm not sure how.
« Last Edit: January 18, 2012, 08:04:51 AM by XenoReseller »