Author Topic: How to read ARGV in Linux?  (Read 14452 times)

Offline MIchael

  • Jr. Member
  • *
  • Posts: 10
How to read ARGV in Linux?
« on: May 30, 2018, 01:08:16 PM »
I want to get to know how correctly read ARGV in Linux and how it works?

Offline sal55

  • Jr. Member
  • *
  • Posts: 18
Re: How to read ARGV in Linux?
« Reply #1 on: May 30, 2018, 05:04:16 PM »
I want to get to know how correctly read ARGV in Linux and how it works?

In your other thread you showed you know how to locate ARGC. Also that you seem to be using 64-bit code.

In that case, I think the value of ARGV is in the next 64-bit location after ARGC. That value should be a pointer to an array of pointers to strings.

Example (untested code because I work on Windows where it's a little different):
Code: [Select]
    mov rax, [....]          ; load argv parameter to rax ([rsp+8] ?)
    mov rbx, [rax]           ; rbx contains pointer to 1st parameter string
    mov rcx, [rax+8]         ; rcx contains pointer to 2nd parameter string (if present)

ARGC gives you the number of arguments (not sure if ARGC or ARGC+1; nor whether the sequence also ends with a null pointer; I guess you need to look at the docs).
« Last Edit: May 30, 2018, 05:08:43 PM by sal55 »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: How to read ARGV in Linux?
« Reply #2 on: May 30, 2018, 09:24:32 PM »
In 32-bit Linux, if you link with gcc, you get a regular C stack:
main's return address
argc
**argv
**envp

If you link with ld, argc is first on the stack (as you know), followed by arguments (first is program name), followed by zero, followed by environment variables.

64-bit seems to be the same. Code posted by sal55 (Thank you!) seems to be for **argv (pointer to pointer), which is not quite what we've got... but would be right for gcc-linked code.

Crude example attached is totally lacking comments, but "seems to work".

Best,
Frank


Offline MIchael

  • Jr. Member
  • *
  • Posts: 10
Re: How to read ARGV in Linux?
« Reply #3 on: May 30, 2018, 10:15:08 PM »
I am grateful to all of you.
I want to note that no community has direct communication with developers. This one is a very rare exception.
Thanks for that too.
« Last Edit: May 30, 2018, 10:19:12 PM by MIchael »

Offline sal55

  • Jr. Member
  • *
  • Posts: 18
Re: How to read ARGV in Linux?
« Reply #4 on: June 01, 2018, 09:03:25 PM »
In 32-bit Linux, if you link with gcc, you get a regular C stack:
main's return address
argc
**argv
**envp

If you link with ld, argc is first on the stack (as you know), followed by arguments (first is program name), followed by zero, followed by environment variables.

64-bit seems to be the same. Code posted by sal55 (Thank you!) seems to be for **argv (pointer to pointer), which is not quite what we've got... but would be right for gcc-linked code.

Yet you show the stack layout for **argv type arguments, rather than the null-terminated char*argv[] array.

On Windows, it is necessary to call MS' __getmainargs() function (part of the MS C runtime) to get the argc, argv (and env) values as used by the normal C main() entry point.

If you use C, or link via gcc, it will arrange to call that function or an equivalent, as otherwise the initial stack has a different layout. But doing things yourself (like writing bare ASM code and using a simpler linker), then you have to do this.

I understand that on Linux, which is more intimately related to C, the stack already has the layout used by C's main() entry point. Then it would need to be **argv rather than *argv[] otherwise Linux would also need an initialisation function to convert *argv[] followed by *env[] into **argv and **env.


Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: How to read ARGV in Linux?
« Reply #5 on: June 01, 2018, 09:52:00 PM »
Perhaps my understanding of C notation is incorrect. "*argv[]" and "**argv" are the same thing, are they not? In any case, the stack you get linking with ld (entrypoint "_start") is not the same as what you get using gcc (entrypoint "main")... unless you tell gcc "--nostartfiles". The "C startup code" rearranges the stack and calls main. I think of it as "crt0.o" but I don't think that's the correct name, these days.

I may be missing your point...

Best,
Frank


Offline sal55

  • Jr. Member
  • *
  • Posts: 18
Re: How to read ARGV in Linux?
« Reply #6 on: June 02, 2018, 12:19:36 AM »
Perhaps my understanding of C notation is incorrect. "*argv[]" and "**argv" are the same thing, are they not? In any case, the stack you get linking with ld (entrypoint "_start") is not the same as what you get using gcc (entrypoint "main")... unless you tell gcc "--nostartfiles". The "C startup code" rearranges the stack and calls main. I think of it as "crt0.o" but I don't think that's the correct name, these days.

I'm using *argv[] in the sense it is used outside of a parameter list in C. Then 'char*argv[]' means an array of char* (ie.  each element being a pointer to a zero-terminated string). That's in line with your description:

"If you link with ld, argc is first on the stack (as you know), followed by arguments (first is program name), followed by zero, followed by environment variables."

But I don't think the stack would contain potentially so many arguments (I could be wrong but I don't know how to run Nasm on Linux to test it).

I think it more likely that the stack contains three arguments only: ARGC, ARGC a pointer to that array of strings,
and ENV a pointer to an array of argument strings. This is how it will (has to) present to a C program.

Anyway the OP has probably already found out which it is.

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: How to read ARGV in Linux?
« Reply #7 on: June 02, 2018, 02:34:30 AM »
Quote
But I don't think the stack would contain potentially so many arguments (I could be wrong but I don't know how to run Nasm on Linux to test it).

I think it more likely that the stack contains three arguments only: ARGC, ARGC a pointer to that array of strings,

What you say makes sense... or should. However, what I observe is that "all those arguments" are on the stack. When I say the "arguments" are on the stack, I mean addresses of zero-terminated strings. Where are those zero-terminated  strings? Well, further up the stack. This process's copy of the environment variables is on the stack, too. There's an amazing amount of cruft on the stack before we get ahold of it. If we link against that "C startup code" we get the three arguments you expect... but the "array of pointers to zero-terminated strings" is this stuff on the stack. At least that's what I've found. I'm less certain of the 64-bit situation.

I understand that in Windows you've got a "GetArgs"(?) API. I wonder what you'd see if you looked on the stack, though. I wouldn't be too surprised if those arguments were there. Gotta be somewhere!

As you point out, Michael has probably figured this out... but maybe there's more to discuss?

Best,
Frank