Author Topic: First program, segmentation fault (cat, linux) (Read 41390 times)

XenoReseller · « **on:** January 16, 2012, 01:44:47 AM »

This was my first shot at assembly after reading a couple of pages from various sources, but I'm getting a segmentation fault....For all I know my code could be COMPLETELY screwed up. Any help?

Code: [Select]

SECTION .bss  ; mutable/modifiable variables
	msg: resb 1


SECTION .text
	global _start

	_start:
		call read
		call write
		mov eax, msg
		cmp eax, 10
		jne _start
		mov eax, 1
		int 80h

	read:
		mov eax, 3
		push 1
		mov ebx, [msg]
		push ebx
		push 0
		int 80h
		ret

	write:
		mov eax, 4
		push 1
		mov ebx, [msg]
		push ebx
		push 1
		int 80h
		ret

I got it working without using push:

Code: [Select]

SECTION .bss  ; mutable/modifiable variables
	msg: resb 1


SECTION .text
	global _start
	_start:
		call read
		call write
		mov eax, msg
		cmp eax, 10
		jne _start
		mov eax, 1
		int 80h

	read:
		mov eax, 3
		mov ebx, 0
		mov ecx, msg
		mov edx, 1
		int 80h
		ret

	write:
		mov eax, 4
		mov ebx, 1
		mov ecx, msg
		mov edx, 1
		int 80h
		ret

Frank Kotler · « **Reply #1 on:** January 16, 2012, 03:33:30 AM »

No, your code isn't COMPLETELY screwed up. The "exit" is right.

It does have some problems!

Code: [Select]

SECTION .bss  ; mutable/modifiable variables
	msg: resb 1


SECTION .text
	global _start

	_start:
		call read
		call write
		mov eax, msg

This puts the address of msg into eax. It's 0x8049000-and-some - never 10. You want "[msg]", ("[contents]" of memory) but I'm not sure that'll do what you want, either...

Code: [Select]

		cmp eax, 10
		jne _start
		mov eax, 1
		int 80h

Okay, but now you go into a weird mix of C and sys_call calling conventions...

Code: [Select]

	read:
		mov eax, 3
		push 1
		mov ebx, [msg]
		push ebx
		push 0
		int 80h
		ret

Here's where your segfault comes from. It is not obvious to beginners, but "call" and "ret" use the stack. "call read" put the return address (the address right after this instruction) on the stack. When you get to "ret", you return to the address on the stack. Since you pushed 0... program go "boom".

The way a "read" should go is more like...

Code: [Select]

read:
    mov eax, 3 ; system call number for "read"
    mov ebx, 0 ; file descriptor to read from 0=stdin
    mov ecx, msg ; address of our buffer
    mov edx, 1 ; how many (max) bytes to read
    int 80h ; call kernel
    ret ; since we didn't push anything, this is okay

What really happens here is that sys_read on stdin doesn't return until you hit a linefeed (the 10 you're looking for). If the user types "abc(enter)", the system call doesn't return until we hit "enter", but since we only asked for one character, the "a" goes into the buffer (msg), the "bc(enter)" stays in the OS's input buffer. If you exited right now, you'd execute the program "bc" (some sort of calculator, apparently). Since you go back and "call read" again, it'll get "b" the next time, then "c", then the 10 you're looking for. So it'll work, but doesn't really do what it appears to.

There's a way to make sys_read return without waiting for the "enter", but it's complicated, and I'm not satisfied with the way I do it.

When you look for the 10, you do:

Code: [Select]

mov eax, [msg]

well... you put the address, not the contents, but... using eax gets the contents of msg - and the next three bytes! Chances are, the next three bytes are 0s in this case, so it'll probably work, but you probably want to use al to get just the one byte you've got in the buffer:

Code: [Select]

mov al, [msg]
cmp al, 10
...

You've got a similar problem in "write"...

Code: [Select]

	write:
		mov eax, 4
		push 1
		mov ebx, [msg]
		push ebx
		push 1
		int 80h
		ret

This, too, would segfault if you got to it. More like...

Code: [Select]

write:
    mov eax, 4 ; sys_call number for "sys_write"
    mov ebx, 1 ; file descriptor 1=stdout
    mov ecx, msg ; address of buffer
    mov edx, 1 ; how many?
    int 80h
    ret

See if that helps. There are some tutorials on the subject here:

http://asm.sourceforge.net/resources.html#tutorials

Best,
Frank

Frank Kotler · « **Reply #2 on:** January 16, 2012, 03:41:11 AM »

Ah! I should have waited for the edit!

Okay, you figured it out. Still better off with:

Code: [Select]

mov al, [msg] ; "[contents]" of memory - just a byte
cmp al, 10
jne _start
...

Best,
Frank

XenoReseller · « **Reply #3 on:** January 16, 2012, 03:52:55 AM »

Quote from: Frank Kotler on January 16, 2012, 03:41:11 AM

Ah! I should have waited for the edit!

Okay, you figured it out. Still better off with:
Code: [Select]
mov al, [msg] ; "[contents]" of memory - just a byte cmp al, 10 jne _start ...
Best,
Frank

No, I'm glad you went through the original code. Now I know that the stack plays a key role with calling/returning. The comparison was still eluding me though! Thanks very much!

Another question, it appears that registers are used for syscalls, correct? What else is the stack used for?

Frank Kotler · « **Reply #4 on:** January 16, 2012, 01:31:23 PM »

Correct, syscalls use registers - in Linux. If you were using BSD, parameters would be passed on the stack... and we'd call the int 80h kinda like...

Code: [Select]

push 1 ; length
push msg
push 1 ; stdin
mov eax, 4 ; sys_write
call kernel_call
add esp, 12 ; "remove" parameters from stack
...

kernel_call:
int 80h
ret

BSD's int 80h doesn't actually need the return address to be there - but it expects the parameters to be in a position on the stack as if it were. We can eliminate the call-and-return...

Code: [Select]

push 1 ; length
push msg
push 1 ; stdin
mov eax, 4 ; sys_write
push eax ; or any "dummy" value
int 80h
add esp, 16 ; "remove" parameters from stack

I've never run BSD, but I'm "pretty sure" that works.

If you were calling a C function, parameters would be passed on the stack:

Code: [Select]

; nasm -f elf32 hwc.asm
; gcc hwc.o -o hwc -m32
; (only need the "-m32" on 64-bit systems)
; ./hwc

global main
extern printf

section .data
    fmtstr db 'Hello, World!',10,0
section .text
    main:
    
    pushad
    
    push dword fmtstr
    call printf
    add esp,4
    
    popad
    
    ret

If we were using Windows, the Windows API expects parameters on the stack, but there's a subtle difference. Windows APIs are "stdcall", in which "callee cleans up stack". We wouldn't need the "add esp, ?" after the call - the API (which we don't normally see code for) ends in "ret ?" instead of a plain "ret", which "removes" the parameters for us. Quite a convenient calling convention, actually. The called function needs to know how many parameters were passed - won't work for something like "printf" which can take a variable number of parameters!

We can also use the stack for "local" (or "stack" or "automatic") variables...

Code: [Select]

my_thing:
; set up a "stack frame"
    push ebp ; save caller's ebp
    mov ebp, esp ; "frame pointer"
    sub esp, 4 ; room for a single local variable
; initialize the local variable
    mov dword [ebp - 4], 42
; do some stuff...
    mov eax, [ebp - 4] ; "return 42" (return value goes in eax)
; destroy the stack frame - this "frees" memory used for our local variable
    mov esp, ebp
    pop ebp ; restore caller's ebp
    ret

(we need to be careful not to alter ebp while all this is going on!) Note that passed parameters (if any) are at ebp + 8, 12, 16, etc. and local variables are at ebp - 4, etc.

There's also push and pop, but you knew that. I think that's everything the stack is ordinarily used for...

Best,
Frank

XenoReseller · « **Reply #5 on:** January 16, 2012, 05:59:05 PM »

Thanks, just one more question... I'm trying to create my own stack using a 256-byte variable. I have a 1-byte variable for the offset and each element is 4 bytes long so I increment the offset by 4. I then try:

Code: [Select]

mov eax, stack ; move stack address into eax
add eax, [offset] ; add offset address to eax

At this point I'm confused as to how to access the address in eax. [var] is the contents while var is the address. But registers don't have the same addresses so what does [eax] and eax mean?

Frank Kotler · « **Reply #6 on:** January 16, 2012, 08:31:46 PM »

I'm not sure I understand the question, but I think you've got a "problem" here...

Code: [Select]

mov eax, stack ; move stack address into eax
add eax, [offset] ; add offset address to eax

If I understand you, "offset" is a one-byte variable. But adding to eax will use four bytes (32 bits) - your one byte of "offset" plus the following three bytes (which may be zeros... if we're lucky). I would suggest using a dword for "offset", even if your intended largest value is 255. Ummm... maybe something like...

Code: [Select]

section .data
    offset dd 0
section .bss
    stack resd 1024 ; 256 * 4
section .text
    mov eax, stack
    add eax, [offset]
; now put something on the "stack"
    mov dword [eax], 42
    add dword [offset], 4

; do other things

; now get our value back off the "stack"
    mov eax, stack
    add eax, [offset]
    mov ebx, [eax]
    sub dword [offset], 4

Now we should have 42 in ebx, and the "stack" (plus its "offset") should be back where it was when we started. In my implementation, the "stack" works upwards to higher memory, whereas the real "stack" works downward from the top of memory. This may not be what you have in mind. Maybe a more concrete, or more complete example of what you're trying to do would clarify my mind (or maybe not...

)

In any case, "[eax]" would refer to the contents of memory at the address held in eax. Note that in some cases, we need to specify the size of the operation - when Nasm can't tell by the size of the register we're using...

I don't know if that really answers your question or not. Don't hesitate to ask for clarification!

Best,
Frank

XenoReseller · « **Reply #7 on:** January 17, 2012, 01:53:04 AM »

Well, here was my broken(really borked) stack at the time I wrote that question:

tes: was just a label I was using to test the stack
Depending on what I changed it too, I'd have segfaults...Other times, it would just print out 3 of the same number that wasn't any of the ones that it should have been.

Also, I noticed you did add dword [offset], 4 even though it was declared in the .data section? I thought variables declared there were constants?

I really appreciate this, you've been very helpful.

Code: [Select]

%define sys_write 4

section .bss
	stack: resb 256
	offset: resb 1
	value: resb 4

section .text
	global _start

	_start:
		call tes
		mov eax, 1
		int 80h

	psh:
		mov eax, stack
		mov eax, [offset]
		mov eax, [value]
		add byte[offset], 0x04
		ret
	pp:
		sub byte[offset], 0x04
		mov eax, stack
		add eax, [offset]
		mov ebx, value
		mov [ebx], eax
		ret
	write:
		mov eax, sys_write
		mov ebx, 1
		mov ecx, [value]
		mov edx, 1
		int 80h
		ret
	tes:
		mov dword[value], '1'
		call psh
		mov dword[value], '2'
		call psh
		mov dword[value], '3'
		call psh
		call pp
		call write
		call pp
		call write
		call pp
		call write
		ret

Mathi · « **Reply #8 on:** January 17, 2012, 03:42:51 AM »

I corrected some problems in your routine (not tested).

Quote

Also, I noticed you did add dword [offset], 4 even though it was declared in the .data section? I thought variables declared there were constants?

No they are NOT constants. (you can read/write to the memory address offset)

It is better to initialize content at memory address 'offset' to zero. (bss => uninitialized variable declaration)

Looking at the the number of bytes you reserved, you should be able to push and pop 64 values (256/4).

It is just a matter of time for you to grasp the difference between memory and memory contents

mov eax, ebx ;; copy value in ebx register to eax register
mov [eax], ebx ;; copy value in ebx register to memory location pointed by eax.. In this case eax is assumed to have a memory address as its value(an address to which the program has access to read/write. otherwise it will result in seg fault).

When you use square brackets in your instruction , you are trying to access memory.

** Except for LEA instruction. (Load effective address doesn't deal with Memory contents).

It is better to specify the size of data you are copying/dealing with (byte, word, dword) when you use []

like ,

add eax, byte [offset] ;; add 1 byte data from memory address offset to eax
or
add eax, word [offset]
or
add eax, dword [offset]

depending on your intent.

Code: [Select]

%define sys_write 4

section .bss
	stack: resb 256
	offset: resb 1
	value: resb 4

section .text
	global _start

	_start:
		mov byte[offset],0 ;;; better initialize to 0
		call tes
		mov eax, 1
		int 80h

	psh:
		mov eax, stack
		add eax, byte [offset]
		mov ebx, dword [value]
		mov [eax],ebx
		add byte[offset], 0x04
		ret
	pp:
		sub byte[offset], 0x04
		mov eax, stack
		add eax, byte [offset]
		mov ebx, [eax]
		mov dword [value],ebx
		ret
	write:
		mov eax, sys_write
		mov ebx, 1
		mov ecx, [value]
		mov edx, 1
		int 80h
		ret
	tes:
		mov dword[value], '1'
		call psh
		mov dword[value], '2'
		call psh
		mov dword[value], '3'
		call psh
		call pp
		call write
		call pp
		call write
		call pp
		call write
		ret

All the best.

Thanks,
Mathi.

XenoReseller · « **Reply #9 on:** January 17, 2012, 04:26:34 AM »

Thanks, I appreciate the input. I understand the theory, but I don't actually know much syntax. So I can figure out these things in my head, but I forget what syntax does what(ie [eax] vs. eax) And in most cases I don't even know half these things exist.

I'll take a look at the code. Thanks very much. After modifying your code to fix mismatch operand size errors with adds it returns with a segmentation fault after compilation.

Here is the code with my comments to display my logic, it seems sound...but obviously this is where theory starts to matter less

Code: [Select]

%define sys_write 4
section .bss
	stack: resb 256 ;256 byte stack, room for 64, 4-byte elements
	offset: resb 1 ;1-byte offset
	value: resb 4 ;4-byte element

section .text
	global _start

	_start:
		mov byte[offset],0 ;;; better initialize to 0
		call tes ;test the routines
		mov eax, 1 ;exit syscall
		mov ebx, 0 ;0, no error
		int 80h ; kernel interrupt

	psh: ;push routine
		mov eax, stack ;eax now contains the stack address
		add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)
		
		mov ebx, dword [value] ;ebx contains the current value to push
		mov [eax],ebx ;value(address specified in eax) now contains the value that was to be pushed
		add byte [offset], 4 ;increment offset for next element
		ret
	pp: ;pop routine
		sub byte [offset], 4 ;offset needs to be decremented to access last written element
		
		mov eax, stack ;eax now contains the stack address
		add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)
		
		mov ebx, dword [eax] ;ebx contains the value to pop
		mov dword [value],ebx ;value now contains the 'popped' value
		ret
	write:
		mov eax, sys_write ;write syscall
		mov ebx, 1 ;stdout
		mov ecx, [value] ;value to write
		mov edx, 1 ;single byte for testing
		int 80h ; kernel interrupt
		ret
	tes: ;test 
		mov dword[value], '1' ;value to push
		call psh ;push it
		mov dword[value], '2'
		call psh
		mov dword[value], '3'
		call psh
		call pp ;pop last value
		call write ;write it
		call pp
		call write
		call pp
		call write
		ret

Mathi · « **Reply #10 on:** January 17, 2012, 07:17:41 AM »

Quote

add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)

Shouldn't this be,

add eax, byte [offset] ;eax contains address of current stack item (stack address + offset)

Since you have reserved only 1 byte for offset. (offset resb 1)

(in both push and pop routines).

XenoReseller · « **Reply #11 on:** January 17, 2012, 07:42:35 AM »

Quote from: Mathi on January 17, 2012, 07:17:41 AM

Quote
add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)

Shouldn't this be,

add eax, byte [offset] ;eax contains address of current stack item (stack address + offset)

Since you have reserved only 1 byte for offset. (offset resb 1)

(in both push and pop routines).

It reports mismatch in operand sizes. I assumed if I declared an operation larger than the variable it would be zero-extended.

Mathi · « **Reply #12 on:** January 17, 2012, 08:50:34 AM »

My bad..., the operand combination is invalid.
Still the addition of a byte from [offset] to eax can be done in a few statements.

mov ecx,0
mov cl,byte [offset]
add eax,ecx

instead of, add eax, byte [offset]

Same holds good for SUB instruction also.

Alternatively, you can reserve DWORD for offset

and use dword [offset] everywhere in your program.

Frank Kotler · « **Reply #13 on:** January 18, 2012, 01:40:20 AM »

"add eax, byte [offset]" is what we want, allright, but there's no such instruction. In general, both operands have to be the same size. There are exceptions. "movzx", for example, will zero-extend a byte or word (movsx sign-extends it). Pity we don't have an "addzx"! But wait... suppose we did "movzx eax, byte [offset]" first, and then "add eax, stack"?

Code: [Select]

%define sys_write 4
section .bss
	stack: resb 256 ;256 byte stack, room for 64, 4-byte elements
	offset: resb 1 ;1-byte offset
	value: resb 4 ;4-byte element

section .text
	global _start

	_start:
		mov byte[offset],0 ;;; better initialize to 0
		call tes ;test the routines
		mov eax, 1 ;exit syscall
		mov ebx, 0 ;0, no error
		int 80h ; kernel interrupt

	psh: ;push routine
;		mov eax, stack ;eax now contains the stack address
;		add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)

    movzx eax, byte [offset]
    add eax, stack
		
		mov ebx, dword [value] ;ebx contains the current value to push
		mov [eax],ebx ;value(address specified in eax) now contains the value that was to be pushed
		add byte [offset], 4 ;increment offset for next element
		ret
	pp: ;pop routine
		sub byte [offset], 4 ;offset needs to be decremented to access last written element
		
;		mov eax, stack ;eax now contains the stack address
;		add eax, dword [offset] ;eax contains address of current stack item (stack address + offset)

    movzx eax, byte [offset]
    add eax, stack
		
		mov ebx, dword [eax] ;ebx contains the value to pop
		mov dword [value],ebx ;value now contains the 'popped' value
		ret
	write:
		mov eax, sys_write ;write syscall
		mov ebx, 1 ;stdout
;		mov ecx, [value] ;value to write
    mov ecx, value ; address(!) of our buffer
		mov edx, 1 ;single byte for testing
		int 80h ; kernel interrupt
		ret
	tes: ;test 
		mov dword[value], '1' ;value to push
		call psh ;push it
		mov dword[value], '2'
		call psh
		mov dword[value], '3'
		call psh
		call pp ;pop last value
		call write ;write it
		call pp
		call write
		call pp
		call write
		ret

I had to make one other small change... sys_write wants the address in ecx, you were putting the "[contents]" in ecx. This appears to work. I can see future problems, in that we don't check for "stack" overflow (or underflow), but it gets around the size-mismatch problem. Easier to make offset a dword, I suspect...

I think I'd take a slightly different approach. I think, instead of keeping an "offset", I'd keep a "stackpointer" (definitely want to be a dword), with the stack address and the offset already added, and keep this up-to-date...

Code: [Select]

    mov eax, [stackpointer]
    add eax, 4 ; or sub
    mov ebx, [eax] ; or vice-versa
    mov [stackpointer], eax
...

Oh... constants! I never use it, so I'd have to look up whether it's ".rdata" or ".rodata" that Nasm uses for a "constant" (unwriteable) data section. You could put constants in "section .text" with the same result (be careful not to put 'em where they'll be executed!). I "just don't" write to stuff in .data or .bss if I want it to be "constant".

.bss is nominally "uninitialized". This is literally true in an old dos .com file. Every other executable format I know of initializes .bss to zeros. I'm not sure we're supposed to count on it, but it is...

Best,
Frank

XenoReseller · « **Reply #14 on:** January 18, 2012, 07:56:15 AM »

Thanks! I have another piece of code I was starting, printing all of it's parameters...Here is what I have:

Code: [Select]

SECTION .data
	newline: db 10
SECTION .bss
	argc: resb 1
	param: resb 4
SECTION .text
	global _start
	_start:
		pop eax
		mov byte[argc], al
		pop eax
	run:
		pop eax
		mov dword[param], eax
		sub byte[argc], 1

		mov eax, 4
		mov ebx, 1
		mov ecx, dword[param]
		mov edx, 4
		int 80h

		mov al, 00
		cmp byte[argc], al

		je exit
		jmp run
	exit:
		mov eax, 4 ;sys_write
		mov ebx, 1
		mov ecx, newline
		mov edx, 1
		int 80h

		mov eax, 1 ;sys_exit
		mov ebx, 0
		int 80h

Lets assume the goal was to print up to 4-byte parameters with no spaces...Did I do it in a "suitable" manner? If not what would you change? Or is there just a super easy way to print all the parameters like echo?(What I'm building up to)

Also. On my system, if I put 'FF' It'll print FFE or 'F' will print 'FSE'. Something wrong with me not cleaning out the memory...but I'm not sure how.

NASM - The Netwide Assembler

News:

Author Topic: First program, segmentation fault (cat, linux) (Read 41390 times)

XenoReseller

First program, segmentation fault (cat, linux)

Frank Kotler

Re: First program, segmentation fault (cat, linux)

Frank Kotler

Re: First program, segmentation fault (cat, linux)

XenoReseller

Re: First program, segmentation fault (cat, linux)

Frank Kotler

Re: First program, segmentation fault (cat, linux)

XenoReseller

Re: First program, segmentation fault (cat, linux)

Frank Kotler

Re: First program, segmentation fault (cat, linux)

XenoReseller

Re: First program, segmentation fault (cat, linux)

Mathi

Re: First program, segmentation fault (cat, linux)

XenoReseller

Re: First program, segmentation fault (cat, linux)

Mathi

Re: First program, segmentation fault (cat, linux)

XenoReseller

Re: First program, segmentation fault (cat, linux)

Mathi

Re: First program, segmentation fault (cat, linux)

Frank Kotler

Re: First program, segmentation fault (cat, linux)

XenoReseller

Re: First program, segmentation fault (cat, linux)