Author Topic: how to deal with arguments  (Read 8914 times)

nobody

  • Guest
how to deal with arguments
« on: February 06, 2008, 09:11:35 PM »
hello,
i'm planning on making an asm app for linux, that uses a file.
so there would be a line like that:
file:      db   "file_name.ext", 0

BUT, i want to have the file specified as an argument to my program, that i would start with the command line:

# ./my_program file_name.ext
how to handle the argument ?
more precisely, i want to know where in memory the file starts. i suppose that's just about choosing the right kernel call? :)

nobody

  • Guest
Re: how to deal with arguments
« Reply #1 on: February 06, 2008, 09:42:39 PM »
At the entrypoint, (_start, by default) argc is the first thing on your stack, followed by a NULL-terminated array of pointers to zero-terminated strings (followed by environment variables).

_start:
pop ebx ; argc
cmp ebx, 2
jnz usage
pop ebx  ; argv[0] - our program name
pop ebx  ; first c.l. parameter
mov eax, 5 ; __NR_open
xor ecx, ecx ; readonly?
int 80h
...

If the entry to your code is "main", you've got a return address on the stack, followed by "**argv" (and **envp). You'll probably want to save caller's ebp, so...

main:
push ebp
mov ebp, esp
cmp dword [ebp + 8], 2
jnz usage
mov eax, [ebp + 12]
mov ebx, [eax + 4]
...

Warning: untested code (especially the C stuff)!

Best,
Frank

nobody

  • Guest
Re: how to deal with arguments
« Reply #2 on: February 07, 2008, 10:05:04 AM »
hm i don't quite understand what ebp is
i found "current frame pointer" , "points to the base of the stack" o_O

PS: why does the "SS" register still exist in 32bit code (SS:ESP)? is that still a segmented mode? when is SS set then?

nobody

  • Guest
Re: how to deal with arguments
« Reply #3 on: February 07, 2008, 10:56:10 PM »
ebp is a 32-bit general-purpose register. That was easy! :)

What makes ebp "special" is that - like esp - it defaults to using ss as its segment register.

*All* addresses on x86 involve a segment and an offset. The "rules" for interpreting the value in a segment register are completely different. In real mode, it's multiplied by 16 and added to the offset to form the complete linear address. In protected mode, the value in a segreg ("selector") serves as an index referring to a "descriptor" which, amongst other information, includes a "base" which is added to the offset to form a linear address. In any OS you're likely to encounter, the "base" is zero for all segregs ("flat memory model")(exception: fs in Windows?), so we can pretend segregs don't exist (gladly!!!)(exception: OS developers). More info here:

http://my.execpc.com/~geezer/johnfine/segments.htm

Or... I understand the CPU manufacturers have Friendly Manuals...

But to get back to ebp... it's just a register, but it is "conventional" to use it as a "stack frame pointer". In 16-bit code, "[sp]" is not a valid addressing mode (!!!), so if we want to refer to items on the stack (besides push/pop), we must:

mov bp, sp

in order to reference parameters passed to the function...

mov ax, [bp + ??]

If we need some local variables - variables which exist for the duration of this function and are then "free"d...

mov bp, sp
sub sp, 2 ; just one local
mov ax, [bp + ??] ; get a parameter
shl ax, 3 ; why not?
mov [bp - 2], ax ; save it in our local var
...

mov sp, bp ; restore original sp!!!
ret

Since our caller is presumably using bp too:

our_thingie:
push bp
mov bp, sp
sub sp, bytes-of-locals
...
mov sp, bp
pop bp
ret

Now in 32-bit code, we have the option of accessing parameters/locals as [esp +/- ???], so the use of ebp might be considered "old school", but it's still convenient - keeping track of the ??? in [ebp +/- ???] tends to be easier than [esp +/- ???].

(the push ebp/mov ebp, esp/sub esp, ??? sequence is so common that the x86 architecture provides the "enter" instruction to encapsulate this. Likewise, "leave" does the mov esp, ebp/pop ebp. I do 'em the "long way" for "clarity"???)

To get back to your original question about getting command-line parameters in your program, ebp needs to concern you *only* if your program entry is "main". In that case, the "_start" label (the "real" entrypoint) is in the C startup code, and  "main" is "call"ed with "int argc", "char **argv", and "char **envp" as parameters. You could access 'em as [esp + ???], but you probably want to use ebp.

If your code starts with "_start" (or some other name that you've told ld about - "-e myentry"), the argument count will be the "first" thing on the stack, "followed" (working upward) by addresses of the "args" - the first one is the name of this program, there's always at least one - "followed" (working upward) by the command line arguments, if any, "followed" by a (dword) zero, "followed" by environment variables in a similar format (zero-terminated).

If "pop"ing stuff you didn't "push" bothers you, or if the "gone forever" nature of this method bothers you, there are other ways to do it. An idea I like is:

_start:
mov [initial_esp], esp

Now, from anywhere in your program, no matter how esp/ebp have been (ab)used, you can find your command line arguments and/or environment variables with a simple calculation. You probably want to do this first anyway, so it's "just a thought"...

If you're continuing to have trouble finding your command line parameters, show us how your code starts off and how you're trying to do it...

Best,
Frank