Author Topic: Linux programing.  (Read 25872 times)

Offline lukus001

  • Jr. Member
  • *
  • Posts: 16
Linux programing.
« on: February 14, 2010, 03:08:27 PM »
Hi all.

I've started getting involved into linux programing and one of the actions I wanted to do recently was to read environment variables.   Reading the linux documentation, I'm simply told to use "getenv()" which is rather abstract as to how linux actually manages environment variables.   

Getenv() is part of libc (or similar, was last week I spent days searching around) which doesn't really help explain how it's done, it simply produces a result...  I've spent a few days searching, reading the header files on my linux system, reading the man pages and various other resource on it to no success.   

I was reffered to http://linuxasmtools.net on a previous topic I made when I was just a guest, so I had a look at library they have there and luckily there is a few examples for accesssing environment variables.   Reading them, I can piece some of it together, though not compleatly.

Obviously, the person who wrote that asm library knows what their doing, but how does someone like me find information /documentation on the linux system thats a bit more detailed than saying "getenv()"?  So I know things like where environment variables are stored, how they are accessed and so on? Or how I can find out ?

Cheers.

Offline Keith Kanios

  • Full Member
  • **
  • Posts: 383
  • Country: us
    • Personal Homepage
Re: Linux programing.
« Reply #1 on: February 14, 2010, 07:31:01 PM »
getenv() is the "direct" approach of accessing the name/value environment variable pairs.

As illustrated here, Linux loading an ELF binary itself involves the population of extra data, such as environment variable pointers, on the stack... this is what the referenced AsmLib utilizes.

Offline Keith Kanios

  • Full Member
  • **
  • Posts: 383
  • Country: us
    • Personal Homepage
Re: Linux programing.
« Reply #2 on: February 14, 2010, 07:40:39 PM »
A couple of more references that may help piece together things.

1.) Environ (About.com)
2.) Writing Your Own Shell (Linux Gazette)

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Linux programing.
« Reply #3 on: February 15, 2010, 07:10:47 AM »
At the "_start:" label, "argc", the command line arguments, and the environment variables are on the stack. Since "_start:" is not "call"ed, there's no return address on the stack, the argument count is the first thing, followed by a zero-terminated array of pointers to zero-terminated command line arguments, followed by a zero-terminated array of pointers to zero-terminated "KEY=value" environment variables. There's no "envc", we have to watch for the terminating zero. We can either "pop" them off as we go along, or reference them "in place" - leaving them for future reference in case we forgot something. :)

If C is involved, the "_start:" label is probably in the startup code, crt0.o or so, which rearranges the stack, and calls "main". "main" probably does "push ebp" first thing, so "argc" would be at [ebp + 8], [ebp + 12] would be "**argv" - just a pointer to the array we already had, and at [ebp + 12], "**envp". I'm not sure this last is always there - is there a difference between "int main (int argc, **argv, **envp)" and just "int main (argc, **argv)"? Dunno. Would have to experiment, or read the code... if I cared. :)

Here's my attempt at what "getenv" would do, if it were hard-coded to "USER".

Code: [Select]
; nasm -f elf myfile.asm
; ld -o myfile myfile.o

; finds the USER environment variable, and says hi,
; using sys_writev - write several buffers at a time.

global _start

section .text
_start:
    nop
commence:
    mov eax, [esp] ; "argc"

; "lea eax, [8 + esp + eax * 4]" is start of environment variables
; 4 bytes for each arg in argc, plus argc itself, plus the terminating 0
; since we're going to be adding 4, calculate 4 short of that.
    lea eax, [4 + esp + eax * 4]

; fancy search algorithm :)
finduser:
    add eax, byte 4
    mov ecx, [eax]
    test ecx, ecx
    jz end_of_env ; terminating zero at end of env array - not found
    cmp dword [ecx], 'USER'
    jne finduser
    cmp byte [ecx + 4], '='
    jne finduser

; got it, advance to "value"
    add ecx, byte 5

; find the length
    or edx, byte -1
getlen:
    inc edx
    cmp byte [ecx + edx], 0
    jne getlen

    mov [username], ecx
    mov [name_len], edx
    jmp short write_it

end_of_env:
    mov [username], dword whodat
    mov [name_len], dword whodat_len

write_it:
    mov eax, 146 ; __NR_writev
    mov ebx, 1 ; stdout
    mov ecx, my_vector ; ptr, len, ptr, len...
    mov edx, 3 ; three items to write
    int 80h

exit:
    mov eax, 1 ; __NR_exit
    int 80h

;-----------------------------------
section .data

    greet db "Hello, "
    greet_len equ $ - greet
   
    coda db "! Welcome to Linux Assembly!", 10
    coda_len equ $ - coda
   
    whodat db "Unknown User"
    whodat_len equ $ - whodat
   
; sys_writev takes a "vector", ptr, len, ptr, len... N times.
    my_vector:
dd greet
dd greet_len
username:
        dd 0 ; fill in at runtime
name_len:
        dd 0 ; fill in at runtime
        dd coda
        dd coda_len

;------------------------------

I don't know if that's any help or not. If we start with "main", finding the array would be slightly different, but after that it would be much the same. Introducing sys_writev is an irrelevant complication - I just wanted to try it out. Dunno if it's faster than calling sys_write three times or not. Should be.

Best,
Frank


Offline lukus001

  • Jr. Member
  • *
  • Posts: 16
Re: Linux programing.
« Reply #4 on: February 15, 2010, 01:17:47 PM »
Thank you Keith and Frank for your responses; much appreciated.

I believe I should be able to do what I wanted now, so many thanks.

I also found out about /proc/self which contains a file called 'environ' (a method that one of the asm lib sample uses) which is basically the same but more kernel orientated, if thats a correct way to discribe it.    So while I should be able to write my program now thanks to you two :3 I'm wondering where I can find out information about linux so I will know about things like environment variables being on the stack, about /proc/self and other things; so should I need to do something else in the future, I can be self sufficient...

Any books anyone can recomend?

Cheers,
Luke.


Offline lukus001

  • Jr. Member
  • *
  • Posts: 16
Re: Linux programing.
« Reply #5 on: February 19, 2010, 11:29:18 PM »
Heya all,

I wrote a program that  *should* read each global variable and place them on a new line and output that via standard-out to be run as a CGI script (has Mime type header too). 

I already wrote a program that would dump the output of global variables via CGI but that was quite crude but it worked at least :3.  Now this newer version on line 78 'a32 lodsb' produces a SIGSEGV, Segmentation fault.   ESI is set to the first global var pointer (or it should do) and it'll retrieve the first lodsb earlier on in the code but not the second instance of it.

please advise :3


(find "null_check_b:" to quickly get to the specific problem line)

Im sure there will be more errors after that one is fixed too :(   
Code: [Select]
section .text
global _start

_start:


;Apache requires the Mime type to be specified, otherwise the program will fail
;Pre-declare Mime-type in text section, ending it with a double linefeed (10)
;"mime" for mime type and "mime_Length" for length

;Apache requires output Via STD-out,
;eax, ~ system write - 4
;ebx, ~ STD-out - 1
;ecx, ~ pointer to message
;edx, ~ pointer to length
;int ~ interrupt to invoke kernel - 0x80

mov eax,4
        mov ebx,1
        mov      ecx,mime
        mov      edx,mime_length
        int      0x80



;Now Apache has the basics, start main code
;We want to read the Global variables one by one and print them on their own line
;A pointer to the global var is already on the stack, as per ELF Standards.

;first value on stack is a pointer to the number of arguments
;Second value onwards are pointers to the actual arguments.
;Null designates the end of argument pointers
;Value after Null is start of Global variables.

;use number of arguments as an offset to ESP to locate start of global vars
;keep a copy of global_vars starting address in global_point
;Store pointer to first global var in ESI
;then get the value, 


mov eax,[esp]
lea eax,[esp+(eax+3)*4]

mov [global_point],eax

mov esi,[eax]
cld
a32 lodsb

;al now has the first byte from global vars, now we must check if it contains null.
;We need a place to start storing our data, so declare some Uninitialised space.
;name it global_vars


;Set edi to our global_vars storage area
;Ascii is an 8bit text format, so we have to check it by the byte.

;null_check_a
;if the start of a pointer is null, no more global variables, end program.
;if null isnt found, store byte and run _b

;null_check_b
;stores each byte untill it hits null, signifying end of that variable.
;run null_b to load next global variable.


mov edi,global_vars
mov DWORD [store_vars_start],global_vars

null_check_a:
test al,al
jz null_a
a32 stosb

null_check_b:

a32 lodsb

test al,al
jz null_b
a32 stosb
jmp null_check_b



;null_a
;Represents end of all global variables, add newline to global_var then run STD-out

;null_b
;represents end of current global variable, looping till it find the end,
;Add new line to end

;global_points has original reference to begining of global variables
;increase it by one Dword so it points to the next value
;Store new value into esi, ready to run null_check_a



null_a:
mov al,10
a32 stosb
jmp end



null_b:
mov al,10
a32 stosb
mov ebx,[global_point]
lea ebx,[ebx+1*4]
mov [global_point],ebx
mov edi,[global_point]
jmp null_check_a



end:
mov edx,global_vars
sub DWORD edx,store_vars_start

;eax, ~ system write - 4
;ebx, ~ STD-out - 1
;ecx, ~ pointer to message
;edx, ~ pointer to length
;int ~ interrupt to invoke kernel - 0x80

mov eax,4
        mov DWORD ebx,1
        mov      ecx,store_vars_start
        ;mov      edx
        int      0x80

       mov     eax,1   ;system call number (sys_exit)
       int     0x80    ;call kernel




section .data


        mime: db 'Content-type: text/html',10,10,'Hello World!',10
        mime_length:    db 38

section .bss

global_point: resb 4

global_vars: resb 16000

store_vars_start: resb 4




Cheers,
Luke



Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Linux programing.
« Reply #6 on: February 20, 2010, 02:53:43 AM »
Well... "a32" isn't neccessary on the "lodsb"s... but that isn't your problem. Your first "lodsb" is intended to be part of "null_check_a", I think. Works okay the first time through, but when you return to it, the 0xa is still in al from "null_b", and you're never going to find a null there! (and it causes 0xa to be stored twice, giving you a doublespace effect) You run esi all the way up to 0xC0000000 and segfault.

Then, in "null_b", there's a typo (I think). You put your "result" in edi - surely you meant esi. But you're not quite ready for that yet. Your "global_point" stores what was the value of esp. Initially, you do mov esi, [eax] to "dereference" it (get the actual pointer to the string from our array of pointers). You need to do that again here... and it would be a good time to check if we've hit that null pointer that terminates the array

Code: [Select]
null_b:
mov al,10
a32 stosb
mov ebx,[global_point]
lea ebx,[ebx+1*4]
mov [global_point],ebx
mov esi,[global_point]
mov esi, [esi]
test esi, esi
jz end
jmp null_check_a

I think that'll work better than expecting a pointer to a null string at the end.

Lessee, I don't think you're calculating the length to print correctly:

Code: [Select]
end:
; mov edx,global_vars
; sub DWORD edx,store_vars_start
mov edx, edi  ; end of our "copy"
sub edx, global_vars ; start of our "copy"

"store_vars_start" would be the address. "[store_vars_start]" would be the "[contents]"... but it's just "global_vars" so the subtraction is going to be zero. (you want the "[]"s in the final print routine, too - or just use "global_vars")

Oh yeah, back at the beginning...

Code: [Select]
; lea eax,[esp+(eax+3)*4]
lea eax,[esp + eax * 4 + 8]

"(eax+3)*4" multiplies out to "eax * 4 + 12", which I don't think is what you want, unless you intended to skip the first environment variable(?). The "eax * 4" skips the command line arguments (there's always at least one - the program name), and you want to skip "argc" itself, and the null that terminates the command line arguments... so +8 should be the start of environment variables.

There may (as always) be further errors...

Best,
Frank


Offline Bryant Keller

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 360
  • Country: us
    • About Bryant Keller
Re: Linux programing.
« Reply #7 on: February 20, 2010, 07:42:51 AM »
Working with environment variables aren't that much harder than working with command line arguments. Here is a short program that displays the environment variables in an HTML table. I tried to comment the relevant parts and use macros to obscure the parts that don't really matter to what you are trying to figure out.

Code: [Select]
BITS 32

STDOUT_FILENO EQU 1

SYS_EXIT EQU 1
SYS_WRITE EQU 4

%define @ARG Ebp + 8
%define @VAR Ebp - 8

%macro CCall 1-*
%push _CCall_
%define %%proc %1
%assign %$ii 0
%rep %0-1
%rotate -1
Push DWORD %1
%assign %$ii %$ii + 1
%endrep
Call %%proc
Add Esp, (4 * %$ii)
%pop
%endm

%macro StdCall 1-*
%push _StdCall_
%define %%proc %1
%rep %0-1
%rotate -1
Push DWORD %1
%endrep
Call %%proc
%pop
%endm


SECTION .data

strHeader DB "Content-type: text-html",10,10
DB "<HTML><HEAD><TITLE>ENVIRONMENT</TITLE></HEAD><BODY>",10
DB "<H1>Environment Variables</H1>",10
DB "<TABLE width='100%' cellspacing='0' cellpadding='0' border='1'>",10
strHeader_size EQU ($-strHeader)

strFooter DB "</TABLE></BODY></HTML>",10
strFooter_size EQU ($-strFooter)

strRowOpen DB "<TR><TD>"
strRowOpen_size EQU ($-strRowOpen)
strRowMid DB "</TD><TD>"
strRowMid_size EQU ($-strRowMid)
strRowClose DB "</TD></TR>", 10
strRowClose_size EQU ($-strRowClose)

SECTION .text

StringLength:
STRUC SLA
.string RESD 1
ENDSTRUC
Push Ebp
Mov Ebp, Esp

Push Ecx
Push Edi

Xor Ecx, Ecx
Not Ecx
Xor Eax, Eax
Mov Edi, [@ARG + SLA.string]
Repne Scasb
Not Ecx
Dec Ecx
Mov Eax, Ecx

Pop Edi
Pop Ecx

Leave
Ret SLA_size

MakeKeyVal:
STRUC MKVA
.addr RESD 1
ENDSTRUC
Push Ebp
Mov Ebp, Esp

Cld
Mov Edi, [@ARG + MKVA.addr]
Mov Al, '='
Repne Scasb
Dec Edi
Mov BYTE [Edi], 0

Mov Eax, [@ARG + MKVA.addr]
Mov Edx, Edi
Inc Edx
Leave
Ret MKVA_size

PrintRow:
STRUC PRA
.key RESD 1
.value RESD 1
ENDSTRUC
Push Ebp
Mov Ebp, Esp

;; --[ Row Open ]--

Mov Edx, strRowOpen_size
Mov Ecx, strRowOpen
Mov Ebx, STDOUT_FILENO
Mov Eax, SYS_WRITE
Int 80h

;; --[ ROW_ARGS.key ]--

StdCall StringLength, [@ARG + PRA.key]
Mov Edx, Eax
Mov Ecx, [@ARG + PRA.key]
Mov Ebx, STDOUT_FILENO
Mov Eax, SYS_WRITE
Int 80h

;; --[ Row Mid ]--

Mov Edx, strRowMid_size
Mov Ecx, strRowMid
Mov Ebx, STDOUT_FILENO
Mov Eax, SYS_WRITE
Int 80h

;; --[ ROW_ARGS.value ]--

StdCall StringLength, [@ARG + PRA.value]
Mov Edx, Eax
Mov Ecx, [@ARG + PRA.value]
Mov Ebx, STDOUT_FILENO
Mov Eax, SYS_WRITE
Int 80h

;; --[ Row Close ]--

Mov Edx, strRowClose_size
Mov Ecx, strRowClose
Mov Ebx, STDOUT_FILENO
Mov Eax, SYS_WRITE
Int 80h

Leave
Ret PRA_size

Main:
STRUC MA
.argc RESD 1
.argv RESD 1
.envp RESD 1
ENDSTRUC
Push Ebp
Mov Ebp, Esp

;; --[ Print CGI Header ]--

Mov Edx, strHeader_size
Mov Ecx, strHeader
Mov Ebx, STDOUT_FILENO
Mov Eax, SYS_WRITE
Int 80h

;; --[ Count Environment Vars ]--

Mov Esi, [@ARG + MA.envp]
Xor Ecx, Ecx
Xor Eax, Eax
.scan_env:
Mov Ebx, [Esi + Ecx]
Or Ebx, Ebx
Jz .done
Add Ecx, 4
Inc Eax
Jmp .scan_env
.done:

;; --[ Print Each Variable ]--

Mov Ecx, Eax

.for_each:
Push Ecx
Push Esi

StdCall MakeKeyVal, [Esi]
StdCall PrintRow, Eax, Edx

Pop Esi
Pop Ecx
Add Esi, 4
Loop .for_each
.end_for:

;; --[ Print CGI Footer ]--

Mov Edx, strFooter_size
Mov Ecx, strFooter
Mov Ebx, STDOUT_FILENO
Mov Eax, SYS_WRITE
Int 80h

Xor Eax, Eax
Leave
Ret MA_size

GLOBAL _start
_start:
Pop Ecx
Mov Esi, Esp
Push Ecx
Lea Eax, [(Esi + 4) + (Ecx * 4)]

;; --[ Explanation ]--
; We pop the argument count off the stack into Ecx
; and save the argument list in Esi (via Esp). Next,
; we restore the argument count to the stack and
; load Eax with the environment list.
; The environment variables are located just after
; the last command line argument, we can find this
; by multiplying 4 by our argument count and adding
; that to the sum of 4 and the address of the base of
; our argument list (we add for to the argument list
; to compensate for a null separator). This results
; in the following layout:
;
; Ecx - int argc
; Esi - char * argv[]
; Eax - char * envp[]
;; --

StdCall Main, Ecx, Esi, Eax

Mov Ebx, Eax
Mov Eax, SYS_EXIT
Int 80h

Hope this helps.

Regards,
Bryant Keller

About Bryant Keller
bkeller@about.me

Offline lukus001

  • Jr. Member
  • *
  • Posts: 16
Re: Linux programing.
« Reply #8 on: February 20, 2010, 07:51:29 PM »
Hi guys thanks again for your responses, much appreciated.

Frank, I added a32 when I had some problems on my first lodsb line when i put it through gdb, but I just removed it and had no problems with it?  I'm using ESI to point to a 32bit address, so does it not need a32 to say I want to load the first byte from ESI?

The problems with null_b you were spot on about.  I wasn't even testing for the null that designates the end of global variables, I was simply treating it as a pointer (a pointer to the address space 'null').  Looks like I also made a lot of mistakes with which labels were what aswell ..

One thing I am slightly confused about, is things like "lea       eax,[reg+reg+#*#].  The original code i got which was from that first site you linked me to had mov reg,[esp+(eax+1)*4].  This gave me seg faults untill I changed it to lea but I still don't quite understand the mechanics.

I know esp + eax gives the offset based on the number of arguements, so then arguments list + program name + null = 3 address spaces = +3 and *4 so it becomes 12 bytes (3 dwords).  I assume the *# has no affect on registers, just the numerics?

Bryant, Thanks your response, I will have a good look at your code, think it might be a little too advanced for me at the moment but I'm sure it'll be educational for me.

Thanks again guys, it is very much appreciated!
cheers.

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Linux programing.
« Reply #9 on: February 21, 2010, 08:26:54 AM »
Hi Luke,

The program name, "argv[0]", is included in argc - it's always at least 1. So "argc * 4" includes that, and we don't need to skip over it separately. We want to skip argc itself, and the dword 0 that ends the command line arguments. But we may be doing "add reg, 4" as the first part of a "get next variable" loop, so might only want to add 4 here.

An "effective address", whether it's an operand to "lea" or not, consists of an optional "offset" or "displacement" - just a number - plus an optional base register, which can be any (32-bit!) general purpose register, plus an optional index register, which can be any GP register but esp, which may be multiplied by an optional scale, which may be 1, 2, 4, or 8.

Nasm will do a little "algebra", so you can write "lea eax, [eax * 5]", and Nasm can figure out that that can be legally expressed as "lea eax, [eax + eax * 4]" (eax is both the base reg and the index reg... and the destination!). It isn't arbitrary - "[eax * 6]" wouldn't work! I prefer to try to write these things so they "look" like a valid effective address. I'd write "lea esi, [8 + esp + eax * 4]" rather than "lea esi, [esp + (eax + 2) * 4]", although they're the same thing. (okay, I might put the "+8" at the end...)

The difference between "lea" and "mov" is that "lea" just calculates the address and puts it in the destination register. "mov" actually fetches the contents of that memory. As such, "lea" can result in some number which isn't in "your" address space, where "mov" would segfault. Despite the fact that it requires a valid effective address as a source operand, "lea" doesn't touch memory, it just does arithmetic.

As for "a32" and friends, you should rarely need them. Nasm knows you want "lodsb" to use esi by virtue of being told "-f elf" (or "-f elf32", a more explicit alias). If you only wanted "lodsb" to use si, you'd use "a16".

Intel, when they went from 16-bit to 32-bit, employed a clever trick, or a horrible kludge, depending on your viewpoint. They used the same opcodes! "mov ax, bx" and "mov eax, ebx" are the exact same opcode. If the CPU is in 16-bit mode, it does "mov ax, bx", and if it's in 32-bit mode, "mov eax, ebx". If you want the "opposite", you can prefix the opcode with the "operand size prefix", 0x66. The "address size prefix" does the same for addresses. Generally, Nasm knows when you need these, and emits them, but for cases where it doesn't - as would be the case for "lodsb" - "a32", "a16", "o32", and "o16" can be used. If we're already doing 32-bit code, "a32" (or "o32") would do nothing, where "a16" would emit 0x67, and "o16" would emit 0x66, before the opcode. "-f bin" and "-f obj" default to doing 16-bit code, so "a32" or "o32" would emit 0x67 or 0x66, and "a/o16" would do nothing.

"lodsb" is simple (0xAC), but "lodsw" and "lodsd" are the same opcode (0xAD), the difference being the 0x66 operand size override. In this case, Nasm knows where to put it without being told. But to toggle si/esi to the "off size", you'd use "a16" or "a32". Not often useful.

That's a lot, to explain that "a32" doesn't do anything. :)

Best,
Frank


Offline lukus001

  • Jr. Member
  • *
  • Posts: 16
Re: Linux programing.
« Reply #10 on: February 22, 2010, 12:49:31 AM »
Hi frank,

Thank you once again for your response.

Everything is crystal clear now! 

Kind regards,
Luke.