Author Topic: Linux programing. (Read 33503 times)

lukus001 · « **on:** February 14, 2010, 03:08:27 PM »

Hi all.

I've started getting involved into linux programing and one of the actions I wanted to do recently was to read environment variables. Reading the linux documentation, I'm simply told to use "getenv()" which is rather abstract as to how linux actually manages environment variables.

Getenv() is part of libc (or similar, was last week I spent days searching around) which doesn't really help explain how it's done, it simply produces a result... I've spent a few days searching, reading the header files on my linux system, reading the man pages and various other resource on it to no success.

I was reffered to http://linuxasmtools.net on a previous topic I made when I was just a guest, so I had a look at library they have there and luckily there is a few examples for accesssing environment variables. Reading them, I can piece some of it together, though not compleatly.

Obviously, the person who wrote that asm library knows what their doing, but how does someone like me find information /documentation on the linux system thats a bit more detailed than saying "getenv()"? So I know things like where environment variables are stored, how they are accessed and so on? Or how I can find out ?

Cheers.

Keith Kanios · « **Reply #1 on:** February 14, 2010, 07:31:01 PM »

getenv() is the "direct" approach of accessing the name/value environment variable pairs.

As illustrated here, Linux loading an ELF binary itself involves the population of extra data, such as environment variable pointers, on the stack... this is what the referenced AsmLib utilizes.

Keith Kanios · « **Reply #2 on:** February 14, 2010, 07:40:39 PM »

A couple of more references that may help piece together things.

1.) Environ (About.com)
2.) Writing Your Own Shell (Linux Gazette)

Frank Kotler · « **Reply #3 on:** February 15, 2010, 07:10:47 AM »

At the "_start:" label, "argc", the command line arguments, and the environment variables are on the stack. Since "_start:" is not "call"ed, there's no return address on the stack, the argument count is the first thing, followed by a zero-terminated array of pointers to zero-terminated command line arguments, followed by a zero-terminated array of pointers to zero-terminated "KEY=value" environment variables. There's no "envc", we have to watch for the terminating zero. We can either "pop" them off as we go along, or reference them "in place" - leaving them for future reference in case we forgot something.

If C is involved, the "_start:" label is probably in the startup code, crt0.o or so, which rearranges the stack, and calls "main". "main" probably does "push ebp" first thing, so "argc" would be at [ebp + 8], [ebp + 12] would be "**argv" - just a pointer to the array we already had, and at [ebp + 12], "**envp". I'm not sure this last is always there - is there a difference between "int main (int argc, **argv, **envp)" and just "int main (argc, **argv)"? Dunno. Would have to experiment, or read the code... if I cared.

Here's my attempt at what "getenv" would do, if it were hard-coded to "USER".

Code: [Select]

; nasm -f elf myfile.asm
; ld -o myfile myfile.o

; finds the USER environment variable, and says hi,
; using sys_writev - write several buffers at a time.

global _start

section .text
_start:
    nop
commence:
    mov eax, [esp] ; "argc"

; "lea eax, [8 + esp + eax * 4]" is start of environment variables
; 4 bytes for each arg in argc, plus argc itself, plus the terminating 0
; since we're going to be adding 4, calculate 4 short of that.
    lea eax, [4 + esp + eax * 4]

; fancy search algorithm :)
finduser:
    add eax, byte 4
    mov ecx, [eax]
    test ecx, ecx
    jz end_of_env ; terminating zero at end of env array - not found
    cmp dword [ecx], 'USER'
    jne finduser
    cmp byte [ecx + 4], '='
    jne finduser

; got it, advance to "value"
    add ecx, byte 5

; find the length
    or edx, byte -1
getlen:
    inc edx
    cmp byte [ecx + edx], 0
    jne getlen

    mov [username], ecx
    mov [name_len], edx
    jmp short write_it

end_of_env:
    mov [username], dword whodat
    mov [name_len], dword whodat_len

write_it:
    mov eax, 146	; __NR_writev
    mov ebx, 1		; stdout
    mov ecx, my_vector	; ptr, len, ptr, len...
    mov edx, 3		; three items to write
    int 80h

exit:
    mov eax, 1		; __NR_exit
    int 80h

;-----------------------------------
section .data

    greet db "Hello, "
    greet_len equ $ - greet
    
    coda db "! Welcome to Linux Assembly!", 10
    coda_len equ $ - coda
    
    whodat db "Unknown User"
    whodat_len equ $ - whodat
    
; sys_writev takes a "vector", ptr, len, ptr, len... N times.
    my_vector:
	dd greet
	dd greet_len
username:
        dd 0		; fill in at runtime
name_len:
        dd 0		; fill in at runtime
        dd coda
        dd coda_len

;------------------------------

I don't know if that's any help or not. If we start with "main", finding the array would be slightly different, but after that it would be much the same. Introducing sys_writev is an irrelevant complication - I just wanted to try it out. Dunno if it's faster than calling sys_write three times or not. Should be.

Best,
Frank

lukus001 · « **Reply #4 on:** February 15, 2010, 01:17:47 PM »

Thank you Keith and Frank for your responses; much appreciated.

I believe I should be able to do what I wanted now, so many thanks.

I also found out about /proc/self which contains a file called 'environ' (a method that one of the asm lib sample uses) which is basically the same but more kernel orientated, if thats a correct way to discribe it. So while I should be able to write my program now thanks to you two :3 I'm wondering where I can find out information about linux so I will know about things like environment variables being on the stack, about /proc/self and other things; so should I need to do something else in the future, I can be self sufficient...

Any books anyone can recomend?

Cheers,
Luke.

lukus001 · « **Reply #5 on:** February 19, 2010, 11:29:18 PM »

Heya all,

I wrote a program that *should* read each global variable and place them on a new line and output that via standard-out to be run as a CGI script (has Mime type header too).

I already wrote a program that would dump the output of global variables via CGI but that was quite crude but it worked at least :3. Now this newer version on line 78 'a32 lodsb' produces a SIGSEGV, Segmentation fault. ESI is set to the first global var pointer (or it should do) and it'll retrieve the first lodsb earlier on in the code but not the second instance of it.

please advise :3

(find "null_check_b:" to quickly get to the specific problem line)

Im sure there will be more errors after that one is fixed too

Code: [Select]

section .text
	global _start

	_start:		


;Apache requires the Mime type to be specified, otherwise the program will fail
;Pre-declare Mime-type in text section, ending it with a double linefeed (10)
;"mime" for mime type and "mime_Length" for length

;Apache requires output Via STD-out,
;eax, 		~ system write - 4
;ebx, 		~ STD-out - 1
;ecx, 		~ pointer to message 
;edx, 		~ pointer to length
;int 		~ interrupt to invoke kernel - 0x80 

	mov		eax,4
        mov		ebx,1
        mov     	ecx,mime
        mov     	edx,mime_length
        int     	0x80



;Now Apache has the basics, start main code
;We want to read the Global variables one by one and print them on their own line
;A pointer to the global var is already on the stack, as per ELF Standards.

;first value on stack is a pointer to the number of arguments 
;Second value onwards are pointers to the actual arguments.
;Null designates the end of argument pointers 
;Value after Null is start of Global variables.

;use number of arguments as an offset to ESP to locate start of global vars
;keep a copy of global_vars starting address in global_point
;Store pointer to first global var in ESI 
;then get the value,  


	mov		eax,[esp]
	lea		eax,[esp+(eax+3)*4]

	mov 		[global_point],eax
	
	mov		esi,[eax]
	cld
	a32 lodsb	

;al now has the first byte from global vars, now we must check if it contains null.
;We need a place to start storing our data, so declare some Uninitialised space.
;name it global_vars


;Set edi to our global_vars storage area
;Ascii is an 8bit text format, so we have to check it by the byte.

;null_check_a 
;if the start of a pointer is null, no more global variables, end program.
;if null isnt found, store byte and run _b

;null_check_b
;stores each byte untill it hits null, signifying end of that variable.
;run null_b to load next global variable.


	mov		edi,global_vars
	mov	DWORD	[store_vars_start],global_vars

	null_check_a:
	test		al,al
	jz		null_a
	a32 stosb	
	
	null_check_b:

	a32 lodsb	

	test		al,al
	jz		null_b
	a32 stosb	
	jmp		null_check_b

	
	
;null_a
;Represents end of all global variables, add newline to global_var then run STD-out

;null_b
;represents end of current global variable, looping till it find the end, 
;Add new line to end

;global_points has original reference to begining of global variables
;increase it by one Dword so it points to the next value
;Store new value into esi, ready to run null_check_a



	null_a:
	mov		al,10
	a32 stosb	
	jmp		end
	


	null_b:
	mov		al,10
	a32 stosb	
	mov		ebx,[global_point]
	lea		ebx,[ebx+1*4]
	mov		[global_point],ebx
	mov		edi,[global_point]
	jmp		null_check_a



	end:
	mov		edx,global_vars
	sub	DWORD	edx,store_vars_start
	
;eax, 		~ system write - 4
;ebx, 		~ STD-out - 1
;ecx, 		~ pointer to message 
;edx, 		~ pointer to length
;int 		~ interrupt to invoke kernel - 0x80 

	mov		eax,4
        mov	DWORD	ebx,1
        mov     	ecx,store_vars_start
        ;mov     	edx
        int     	0x80

       mov     eax,1   ;system call number (sys_exit)
       int     0x80    ;call kernel




section .data


        mime:		db		'Content-type: text/html',10,10,'Hello World!',10
        mime_length:    db 		38

section .bss

	global_point:	resb	4
	
	global_vars: 	resb	16000

	store_vars_start:	resb	4

Cheers,
Luke

Frank Kotler · « **Reply #6 on:** February 20, 2010, 02:53:43 AM »

Well... "a32" isn't neccessary on the "lodsb"s... but that isn't your problem. Your first "lodsb" is intended to be part of "null_check_a", I think. Works okay the first time through, but when you return to it, the 0xa is still in al from "null_b", and you're never going to find a null there! (and it causes 0xa to be stored twice, giving you a doublespace effect) You run esi all the way up to 0xC0000000 and segfault.

Then, in "null_b", there's a typo (I think). You put your "result" in edi - surely you meant esi. But you're not quite ready for that yet. Your "global_point" stores what was the value of esp. Initially, you do mov esi, [eax] to "dereference" it (get the actual pointer to the string from our array of pointers). You need to do that again here... and it would be a good time to check if we've hit that null pointer that terminates the array

Code: [Select]

	null_b:
	mov		al,10
	a32 stosb	
	mov		ebx,[global_point]
	lea		ebx,[ebx+1*4]
	mov		[global_point],ebx
	mov		esi,[global_point]
	mov 		esi, [esi]
	test esi, esi
	jz end
	jmp		null_check_a

I think that'll work better than expecting a pointer to a null string at the end.

Lessee, I don't think you're calculating the length to print correctly:

Code: [Select]

	end:
;	mov		edx,global_vars
;	sub	DWORD	edx,store_vars_start
	mov edx, edi  ; end of our "copy"
	sub edx, global_vars ; start of our "copy"

"store_vars_start" would be the address. "[store_vars_start]" would be the "[contents]"... but it's just "global_vars" so the subtraction is going to be zero. (you want the "[]"s in the final print routine, too - or just use "global_vars")

Oh yeah, back at the beginning...

Code: [Select]

;	lea		eax,[esp+(eax+3)*4]
	lea		eax,[esp + eax * 4 + 8]

"(eax+3)*4" multiplies out to "eax * 4 + 12", which I don't think is what you want, unless you intended to skip the first environment variable(?). The "eax * 4" skips the command line arguments (there's always at least one - the program name), and you want to skip "argc" itself, and the null that terminates the command line arguments... so +8 should be the start of environment variables.

There may (as always) be further errors...

Best,
Frank

Bryant Keller · « **Reply #7 on:** February 20, 2010, 07:42:51 AM »

Working with environment variables aren't that much harder than working with command line arguments. Here is a short program that displays the environment variables in an HTML table. I tried to comment the relevant parts and use macros to obscure the parts that don't really matter to what you are trying to figure out.

Code: [Select]

	BITS 32

	STDOUT_FILENO	EQU 1

	SYS_EXIT	EQU 1
	SYS_WRITE	EQU 4

	%define @ARG Ebp + 8
	%define @VAR Ebp - 8

	%macro CCall 1-*
	%push _CCall_
		%define %%proc %1
		%assign %$ii 0
		%rep %0-1
			%rotate -1
			Push DWORD %1
			%assign %$ii %$ii + 1
		%endrep
		Call %%proc
		Add Esp, (4 * %$ii)
	%pop
	%endm

	%macro StdCall 1-*
	%push _StdCall_
		%define %%proc %1
		%rep %0-1
			%rotate -1
			Push DWORD %1
		%endrep
		Call %%proc
	%pop
	%endm


SECTION .data

strHeader		DB "Content-type: text-html",10,10
		DB "<HTML><HEAD><TITLE>ENVIRONMENT</TITLE></HEAD><BODY>",10
		DB "<H1>Environment Variables</H1>",10
		DB "<TABLE width='100%' cellspacing='0' cellpadding='0' border='1'>",10
strHeader_size	EQU ($-strHeader)

strFooter		DB "</TABLE></BODY></HTML>",10
strFooter_size	EQU ($-strFooter)

strRowOpen	DB "<TR><TD>"
strRowOpen_size	EQU ($-strRowOpen)
strRowMid		DB "</TD><TD>"
strRowMid_size	EQU ($-strRowMid)
strRowClose	DB "</TD></TR>", 10
strRowClose_size	EQU ($-strRowClose)

SECTION .text

StringLength:
STRUC SLA
.string	RESD 1
ENDSTRUC
	Push Ebp
	Mov Ebp, Esp
	
		Push Ecx
		Push Edi
		
			Xor Ecx, Ecx
			Not Ecx
			Xor Eax, Eax
			Mov Edi, [@ARG + SLA.string]
		Repne	Scasb
			Not Ecx
			Dec Ecx
			Mov Eax, Ecx
		
		Pop Edi
		Pop Ecx
	
	Leave
	Ret SLA_size

MakeKeyVal:
STRUC MKVA
.addr	RESD 1
ENDSTRUC
	Push Ebp
	Mov Ebp, Esp

		Cld
		Mov Edi, [@ARG + MKVA.addr]
		Mov Al, '='
	Repne	Scasb
		Dec Edi
		Mov BYTE [Edi], 0

		Mov Eax, [@ARG + MKVA.addr]
		Mov Edx, Edi
		Inc Edx
	Leave
	Ret MKVA_size

PrintRow:
STRUC PRA
.key	RESD 1
.value	RESD 1
ENDSTRUC
	Push Ebp
	Mov Ebp, Esp

		;; --[ Row Open ]--

		Mov Edx, strRowOpen_size
		Mov Ecx, strRowOpen
		Mov Ebx, STDOUT_FILENO
		Mov Eax, SYS_WRITE
		Int 80h

		;; --[ ROW_ARGS.key ]--

		StdCall StringLength, [@ARG + PRA.key]
		Mov Edx, Eax
		Mov Ecx, [@ARG + PRA.key]
		Mov Ebx, STDOUT_FILENO
		Mov Eax, SYS_WRITE
		Int 80h

		;; --[ Row Mid ]--

		Mov Edx, strRowMid_size
		Mov Ecx, strRowMid
		Mov Ebx, STDOUT_FILENO
		Mov Eax, SYS_WRITE
		Int 80h

		;; --[ ROW_ARGS.value ]--

		StdCall StringLength, [@ARG + PRA.value]
		Mov Edx, Eax
		Mov Ecx, [@ARG + PRA.value]
		Mov Ebx, STDOUT_FILENO
		Mov Eax, SYS_WRITE
		Int 80h

		;; --[ Row Close ]--

		Mov Edx, strRowClose_size
		Mov Ecx, strRowClose
		Mov Ebx, STDOUT_FILENO
		Mov Eax, SYS_WRITE
		Int 80h

	Leave
	Ret PRA_size

Main:
STRUC MA
.argc	RESD	1
.argv	RESD	1
.envp	RESD	1
ENDSTRUC
	Push Ebp
	Mov Ebp, Esp

		;; --[ Print CGI Header ]--

		Mov Edx, strHeader_size
		Mov Ecx, strHeader
		Mov Ebx, STDOUT_FILENO
		Mov Eax, SYS_WRITE
		Int 80h

		;; --[ Count Environment Vars ]--

		Mov Esi, [@ARG + MA.envp]
		Xor Ecx, Ecx
		Xor Eax, Eax
		.scan_env:
			Mov Ebx, [Esi + Ecx]
			Or Ebx, Ebx
			Jz .done
			Add Ecx, 4
			Inc Eax
			Jmp .scan_env
		.done:

		;; --[ Print Each Variable ]--

		Mov Ecx, Eax

		.for_each:
			Push Ecx
			Push Esi

				StdCall MakeKeyVal, [Esi]
				StdCall PrintRow, Eax, Edx

			Pop Esi
			Pop Ecx
			Add Esi, 4
			Loop .for_each
		.end_for:

		;; --[ Print CGI Footer ]--

		Mov Edx, strFooter_size
		Mov Ecx, strFooter
		Mov Ebx, STDOUT_FILENO
		Mov Eax, SYS_WRITE
		Int 80h

	Xor Eax, Eax
	Leave
	Ret MA_size

GLOBAL _start
_start:
	Pop Ecx
	Mov Esi, Esp
	Push Ecx
	Lea Eax, [(Esi + 4) + (Ecx * 4)]

	;; --[ Explanation ]--
	; We pop the argument count off the stack into Ecx
	; and save the argument list in Esi (via Esp). Next,
	; we restore the argument count to the stack and
	; load Eax with the environment list.
	; The environment variables are located just after
	; the last command line argument, we can find this
	; by multiplying 4 by our argument count and adding
	; that to the sum of 4 and the address of the base of
	; our argument list (we add for to the argument list
	; to compensate for a null separator). This results
	; in the following layout:
	;
	; Ecx - int argc
	; Esi - char * argv[]
	; Eax - char * envp[]
	;; --

	StdCall Main, Ecx, Esi, Eax

	Mov Ebx, Eax
	Mov Eax, SYS_EXIT
	Int 80h

Hope this helps.

Regards,
Bryant Keller

lukus001 · « **Reply #8 on:** February 20, 2010, 07:51:29 PM »

Hi guys thanks again for your responses, much appreciated.

Frank, I added a32 when I had some problems on my first lodsb line when i put it through gdb, but I just removed it and had no problems with it? I'm using ESI to point to a 32bit address, so does it not need a32 to say I want to load the first byte from ESI?

The problems with null_b you were spot on about. I wasn't even testing for the null that designates the end of global variables, I was simply treating it as a pointer (a pointer to the address space 'null'). Looks like I also made a lot of mistakes with which labels were what aswell ..

One thing I am slightly confused about, is things like "lea eax,[reg+reg+#*#]. The original code i got which was from that first site you linked me to had mov reg,[esp+(eax+1)*4]. This gave me seg faults untill I changed it to lea but I still don't quite understand the mechanics.

I know esp + eax gives the offset based on the number of arguements, so then arguments list + program name + null = 3 address spaces = +3 and *4 so it becomes 12 bytes (3 dwords). I assume the *# has no affect on registers, just the numerics?

Bryant, Thanks your response, I will have a good look at your code, think it might be a little too advanced for me at the moment but I'm sure it'll be educational for me.

Thanks again guys, it is very much appreciated!
cheers.

Frank Kotler · « **Reply #9 on:** February 21, 2010, 08:26:54 AM »

Hi Luke,

The program name, "argv[0]", is included in argc - it's always at least 1. So "argc * 4" includes that, and we don't need to skip over it separately. We want to skip argc itself, and the dword 0 that ends the command line arguments. But we may be doing "add reg, 4" as the first part of a "get next variable" loop, so might only want to add 4 here.

An "effective address", whether it's an operand to "lea" or not, consists of an optional "offset" or "displacement" - just a number - plus an optional base register, which can be any (32-bit!) general purpose register, plus an optional index register, which can be any GP register but esp, which may be multiplied by an optional scale, which may be 1, 2, 4, or 8.

Nasm will do a little "algebra", so you can write "lea eax, [eax * 5]", and Nasm can figure out that that can be legally expressed as "lea eax, [eax + eax * 4]" (eax is both the base reg and the index reg... and the destination!). It isn't arbitrary - "[eax * 6]" wouldn't work! I prefer to try to write these things so they "look" like a valid effective address. I'd write "lea esi, [8 + esp + eax * 4]" rather than "lea esi, [esp + (eax + 2) * 4]", although they're the same thing. (okay, I might put the "+8" at the end...)

The difference between "lea" and "mov" is that "lea" just calculates the address and puts it in the destination register. "mov" actually fetches the contents of that memory. As such, "lea" can result in some number which isn't in "your" address space, where "mov" would segfault. Despite the fact that it requires a valid effective address as a source operand, "lea" doesn't touch memory, it just does arithmetic.

As for "a32" and friends, you should rarely need them. Nasm knows you want "lodsb" to use esi by virtue of being told "-f elf" (or "-f elf32", a more explicit alias). If you only wanted "lodsb" to use si, you'd use "a16".

Intel, when they went from 16-bit to 32-bit, employed a clever trick, or a horrible kludge, depending on your viewpoint. They used the same opcodes! "mov ax, bx" and "mov eax, ebx" are the exact same opcode. If the CPU is in 16-bit mode, it does "mov ax, bx", and if it's in 32-bit mode, "mov eax, ebx". If you want the "opposite", you can prefix the opcode with the "operand size prefix", 0x66. The "address size prefix" does the same for addresses. Generally, Nasm knows when you need these, and emits them, but for cases where it doesn't - as would be the case for "lodsb" - "a32", "a16", "o32", and "o16" can be used. If we're already doing 32-bit code, "a32" (or "o32") would do nothing, where "a16" would emit 0x67, and "o16" would emit 0x66, before the opcode. "-f bin" and "-f obj" default to doing 16-bit code, so "a32" or "o32" would emit 0x67 or 0x66, and "a/o16" would do nothing.

"lodsb" is simple (0xAC), but "lodsw" and "lodsd" are the same opcode (0xAD), the difference being the 0x66 operand size override. In this case, Nasm knows where to put it without being told. But to toggle si/esi to the "off size", you'd use "a16" or "a32". Not often useful.

That's a lot, to explain that "a32" doesn't do anything.

Best,
Frank

lukus001 · « **Reply #10 on:** February 22, 2010, 12:49:31 AM »

Hi frank,

Thank you once again for your response.

Everything is crystal clear now!

Kind regards,
Luke.

NASM - The Netwide Assembler

News:

Author Topic: Linux programing. (Read 33503 times)

lukus001

Linux programing.

Keith Kanios

Re: Linux programing.

Keith Kanios

Re: Linux programing.

Frank Kotler

Re: Linux programing.

lukus001

Re: Linux programing.

lukus001

Re: Linux programing.

Frank Kotler

Re: Linux programing.

Bryant Keller

Re: Linux programing.

lukus001

Re: Linux programing.

Frank Kotler

Re: Linux programing.

lukus001

Re: Linux programing.