Author Topic: Working with command line arguments  (Read 39999 times)

Offline qiet72

  • Jr. Member
  • *
  • Posts: 6
Working with command line arguments
« on: September 10, 2010, 12:57:10 PM »
Hi,

Still very much a beginner.  I am learning about retrieving command line arguments.  What I want to do is, for example, get the number of command line arguments and print it out to the standard output.  Here is my program:

section .text
   global _start

_start:
   pop   ecx      ; Get the number of arguments and print it out to screen
   mov   eax,4
   mov   ebx,1
   mov   edx,1
   int   80h

   mov   eax,1
   mov   ebx,0
   int   80h      ; Exit

It compiles, but nothing happens. Of course, I could print it out with "echo $?" by putting the value into ebx of exit syscall, but I want the program to tell me the number of arguments passed.  Do I need to reserve some address space (variables) and put it in there before the write syscall will work?

qiet72

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Working with command line arguments
« Reply #1 on: September 11, 2010, 02:05:57 AM »
Yep. We want the address of a buffer in ecx for sys_write. The number of arguments is not the address of a buffer! I expected that this would segfault, actually, but it returns -EFAULT - "bad address". How do I know this? I cut-and-pasted some code I had laying around into your code, and it told me! I'll attach it.

But that isn't the end of your troubles! The number of arguments is a number (duh). What we want to print is one or more ascii characters representing that number. The ascii code representing '1' (there will always be at least one argument - the name of the program) is not 1, but 49 (or 31h, which may be easier). Typing "man ascii" will get you an ascii chart. As you'll see, ascii codes under 20h (space) are "control codes". Printable characters start at 20h, numbers (numerals - characters representing numbers) start at 30h, uppercase starts at 40h, lowercase starts at 60h. Punctuation is tucked in between. Fortunately, the characters representing decimal digits are contiguous, so we can convert a single digit by just adding 48, or 30h, or '0' (the latter being somewhat "self-documenting). Try this:

Code: [Select]
; nasm -f elf32 argc.asm
; ld -o argc argc.o

global _start

section .text
_start:

    pop ecx ; argc in ecx
    add ecx, '0' ; convert number to character - ASSume one digit :(
    push ecx ; need it in a buffer - use the stack
    mov ecx, esp ; address of buffer in ecx, as sys_write wants!
   
    mov edx, 1 ; number of bytes to write
    mov ebx, 1 ; file descriptor - STDOUT
    mov eax, 4 ; __NR_write
    int 80h
   
exit:
    xor ebx, ebx ; claim "no error"
    mov eax, 1 ; __NR_exit
    int 80h


That's kinda sloppy. It blindly ASSumes only one digit in argc, and prints garbage if the number takes more than one digit. Since the first argument is the program name, that's only eight arguments before it overflows. Furthermore, it doesn't print a newline after the "answer", so it bumps into the prompt. The latter is easily fixed, multiple digits are a little more complicated. I've left some "lint" in that attached file with the "checkerror" macro - subroutines that print eax as decimal, or as hex. They aren't intended to produce nicely formatted output - a "quicky" for on-the-fly debugging purposes, mostly. I left 'em in, in case you find 'em useful. If you don't understand how they work, we can discuss that...

The buffer doesn't have to be on the stack, of course. Since you're interested in variables in "section .bss", I probably should have showed you how to do it that way. Easy enough - you can probably figure it out. Or, we can discuss that, too...

Best,
Frank


Offline qiet72

  • Jr. Member
  • *
  • Posts: 6
Re: Working with command line arguments
« Reply #2 on: September 14, 2010, 01:31:56 PM »
Thanks,

Now I have learned about that you can get the addess of the stack buffer by using esp, cool.  Now I want to continue printing out the other arguments.  Here is my attempt, but I haven't figured out how to do it with the next argument which is a string.

section .text
   global _start

_start:
   pop   ecx      ; Get the number of arguments
   add   ecx, '0'   ; convert to a ascii number
   push  ecx      ; push the result into memory using the stack
   mov   ecx, esp ; move the address of the stack pointer to ecx for sys_write
   mov   eax,4    ; Function 4 - write
   mov   ebx,1    ; to stdout
   mov   edx,1    ; buffer size
   int   80h

   pop   ecx      ; Dump number of args from stack
   mov   ecx, esp ; Point ecx to the current stack pointer
   mov   eax,4    ; Function 4 - write"
   mov   ebx,1    ; to stdout
   mov   edx,255    ; buffer size
   int   80h

   mov   eax,1
   mov   ebx,ecx
   int   80h      ; Exit

Of course, this writes a bunch of garbage, so I am wondering, how do I convert that garbage to a proper string?  Must I use a ".bss" or ".data" section to get the string or can I use the stack again to retrieve the second argument and then write it out with the sys_write command?  I tried sending the output through hexdump but I can't find any hex values that match up with the ascii table in 'man ascii'.  Funny thing is, we don't know how long the second argument is so I would have to guess a length using the .bss section.  If I do that, then I would not only print the current argument but also all the other arguments as well that are in the stack - how do I calculated the length of the current argument in the stack?

qiet72

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Working with command line arguments
« Reply #3 on: September 14, 2010, 10:04:28 PM »
After "argc", which is a number - we need to conver to character(s) if we want to print it - the remaining values on the stack are pointers to strings. Easier(?), but as you figured out, we need to calculate the length to put in edx. There are faster ways to find string length - important if the strings are long - but for short strings (a dozen or so bytes), a naive "walk down it a byte at a time" works okay...

Code: [Select]
; nasm -f elf32 myfile.asm
; ld -o myfile myfile.o

section .text
   global _start

_start:
   pop   ecx      ; Get the number of arguments
   add   ecx, '0'   ; convert to a ascii number
   push  ecx      ; push the result into memory using the stack
   mov   ecx, esp ; move the address of the stack pointer to ecx for sys_write
   mov   eax,4    ; Function 4 - write
   mov   ebx,1    ; to stdout
   mov   edx,1    ; buffer size
   int   80h

   pop   ecx      ; Dump number of args from stack
   
nextarg:
   pop ecx ; get pointer to string

; we could have kept "argc", and used it as a loop counter
; but args pointers are terminated with zero (dword),
; so if we popped a zero, we're done (environment variables follow this)

   test ecx, ecx ; or "cmp ecx, 0"
   jz exit

; now we need to find the length of our (zero-terminated) string

   xor edx, edx ; or "mov edx, 0"
getlen:
   cmp byte [ecx + edx], 0
   jz gotlen
   inc edx
   jmp getlen
gotlen:

; now ecx -> string, edx = length
   mov   eax,4    ; Function 4 - write"
   mov   ebx,1    ; to stdout
   int   80h

; probably want to print a newline here, "for looks"

   jmp nextarg ; cook until done
   
exit:
   mov   eax,1
   mov   ebx,ecx ; ??? return... something...
   int   80h      ; Exit
;--------------------------

That just runs all the arguments together - along with argc - and looks like crap, but I've left it for you to provide the "print a newline", if you want to. (a "real program" would probably "do something" with the arguments, not just print 'em, so you might not need it) If you do it "in line", you'll probably notice that you're writing repetitive bits of code, with only small variations. Might be a good place for a subroutine. This is easy - just "call my_subroutine" and end "my subroutine" with a "ret". It is not immediately obvious, but "call" and "ret" use the stack to store the return address, so when you get to the "ret", it is vitally important that esp really points to the return address! It will, but if you meddle with the stack within your subroutine, you've got to "put it back". Give it a shot!

Best,
Frank


Offline qiet72

  • Jr. Member
  • *
  • Posts: 6
Re: Working with command line arguments
« Reply #4 on: September 15, 2010, 11:30:41 AM »
Hi,

Ok, I am understanding the code you wrote.  He is the code again with newline added.  See if I did this right:

Code: [Select]
; Get the program arguments from the stack which are (in order):
; number of arguments, program name, arg 1, arg 2, and so on

section .text
   global _start

_start:
   pop   ecx      ; Get the number of arguments
   add   ecx, '0'   ; convert to a ascii number
   push  ecx      ; push the result into memory using the stack
   mov   ecx, esp ; move the address of the stack pointer to ecx for sys_write
   mov   eax,4    ; Function 4 - write"
   mov   ebx,1    ; to stdout
   mov   edx,1    ; buffer size
   int   80h

   pop   ecx        ; Dump number of args from stack
   call  newline    ; Print a newline

                            ; stacks - each 'pop' gives the value of the next item in the stack
                            ;          if there are no more items in the stack, it returns null

nextarg:
   pop   ecx ; get pointer to string

                            ; Comment by Frank Kotler:
                            ; we could have kept "argc", and used it as a loop counter
                            ; but args pointers are terminated with zero (dword),
                            ; so if we popped a zero, we're done (environment variables follow this)

   test ecx, ecx ; or "cmp ecx, 0" - Check if we reached the end of our string
   jz exit

                            ; now we need to find the length of our (zero-terminated) string

   xor edx, edx ; or "mov edx, 0"  ; Initialize edx to zero
getlen:
   cmp byte [ecx + edx], 0  ; Take each byte and see if it is a terminated string (0x00h) or null
   jz gotlen                ; If the previous instruction found the terminated string, jump out of our loop
   inc edx                  ; Nope, point to the next character or byte
   jmp getlen               ; and check again with getlen procedure
gotlen:

                            ; now ecx -> string, edx = length
   mov   eax,4    ; Function 4 - write"
   mov   ebx,1    ; to stdout
   int   80h         ; call sys_write
   call  newline    ; Print a newline charaction by calling our newline function
   jmp   nextarg ; Process the next argument

newline:
                  ; Frank: probably want to print a newline here, "for looks"
                  ; Here is my solution to newline:

   mov   edx, 10  ; Move 'newline' character into edx
   push  edx      ; Put it into the stack
   mov   ecx, esp ; Put the current stack pointer into ecx for sys_write
   mov   eax,4    ; sys_write
   mov   ebx,1    ; stdout
   mov   edx,1    ; Only one character to print out
   int   80h      ; call sys_write
   pop   edx      ; Don't need newline anymore, get rid of it so stack pointer points to what it was before
   ret

exit:
   mov   eax,1
   mov   ebx,0
   int   80h      ; Exit

I think this somewhat completes my mission to create some assembler code to print command line arguements.  Although, in DOS assembler it takes much less steps:

Code: [Select]
; DOS .com program, compile with nasm -f bin <program.asm>
; no linker required :-)
; sample from http://en.wikipedia.org/wiki/Program_Segment_Prefix

org   100h

; int 21h subfunction 9 requires '$' to terminate string
xor   bx, bx
mov   bl, [80h]
mov   byte [bx + 81h], '$'

; print the string
mov   ah, 9
mov   dx, 81h
int   21h

; exit
mov   ax, 4C00h
int   21h

But learning this the hard way was much more interesting as you can see and understand the steps it takes to create code like this.
Thanks very much for the help Frank.  I'll be using this forum a lot for getting help on learning assembly.

Questions:
1: How large is the stack?  Can I keep on pushing until I run out of memory?
2: How much data can you put in a register?  Can I do "mov ecx, '<1024 bytes of data>' "?  If ecx is a 32-bit register, then I am guessing 4 bytes?

qiet72

Offline cm

  • Jr. Member
  • *
  • Posts: 65
Re: Working with command line arguments
« Reply #5 on: September 15, 2010, 12:05:12 PM »
I think this somewhat completes my mission to create some assembler code to print command line arguements.  Although, in DOS assembler it takes much less steps:

[...]

Yes, but in this form you'll have to parse the command line yourself if you want the argc/argv format. (Besides, the command line that can be accessed in the Process Segment Prefix (PSP) is limited to 126 bytes.)

Quote
Questions:
1: How large is the stack?  Can I keep on pushing until I run out of memory?

Theoretically, yes. In a DOS .COM program (running in Real Mode, as opposed to Protected Mode) you'll eventually overwrite your program's own data and code (starting from the end of the program) which usually results in crashes. Protected Mode operating systems will usually abort the program if it causes a Protection Fault (GPF), i.e. tries to write (stack) data where it isn't allowed to. This might not be what you are looking for: There's no kind of notification to allow you catching a "stack overflow" (when the stack runs out) before it happens. You have to prevent the stack from overflowing. You have to evaluate how much stack space your program might need (including operating system calls) and provide that plus some reserve. Sometimes (for recursive or very deep function calls) you might want to check yourself whether you're near the end of the stack, and abort the function call if so.

Quote
2: How much data can you put in a register?  Can I do "mov ecx, '<1024 bytes of data>' "?  If ecx is a 32-bit register, then I am guessing 4 bytes?

Correct. However, you can for example use pointers to data instead of the data's value. This pointer is basically the memory address of where the data is stored, and that data can of course be larger than the pointer. So you could store a (32-bit) pointer in ecx which points to a 1024-byte block of data in memory. Although you could pass a block of 1024 bytes on the stack, that is quite heavy on the stack. Doing this for too many large parameters or nesting functions to deep that pass large parameters overflows the stack (see above). Thus the pointer trick is often used to pass data via the stack too.
C. Masloch

Offline qiet72

  • Jr. Member
  • *
  • Posts: 6
Re: Working with command line arguments
« Reply #6 on: September 15, 2010, 12:27:00 PM »

Thanks for the info.  How do I check how much stack space I have left in assembler?  I understand that in DOS .COM, there is not very much stack space within 64K, but how much do I have in a 32-bit program?

qiet72

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: Working with command line arguments
« Reply #7 on: September 15, 2010, 01:12:55 PM »
If you need to, you can specify the size yourself using a linker directive. For Linux use ld --stack xxxx, for Windows use link /STACK xxxx, where xxxx is the size in bytes.
The defaults are normally fine for most programs - 1MB and 2MB reserved for Windows and Linux respectively.

Offline cm

  • Jr. Member
  • *
  • Posts: 65
Re: Working with command line arguments
« Reply #8 on: September 15, 2010, 03:22:52 PM »
Thanks for the info.  How do I check how much stack space I have left in assembler?  I understand that in DOS .COM, there is not very much stack space within 64K, [...]

Right.

Within a DOS .COM program, your stack pointer is set to the top of available memory or at the top of the first segment allocated to your program. That means, usually sp starts out as FFFEh (16-bit representation of -2) because modern DOS systems usually will have more than 64 KiB available for your program. You shouldn't depend on that though. The stack (on all x86 systems) starts out at high addresses, then "grows" down. This means, "push ax" first subtracts 2 from sp, then stores the value of ax where sp points now. In a DOS .COM program, you can always check whether you're near the "bottom" (ie end) of the stack by comparing sp with an offset behind the last code or data in your program. This is a simple program (which really does nothing) that shows how to check whether the stack is near to an overflow:

Code: [Select]
org 100h

mov ax, 4C00h
cmp sp, programend+64
jae okay

mov al, 0FFh

okay:
int 21h


db "Example data"

; All code and data of the program needs to be in front of this label, even uninitialized data (BSS).
programend:

The "org 100h" is required (as usual for .COM executables) because otherwise the label wouldn't be computed correctly by NASM. This example will usually return errorlevel 0 but would return 255 (FFh) in very tight load conditions. sp is compared to programend+64; that adjustment of 64 bytes is a guess of how much stack space a DOS call (for example, interrupt 21h) might need. (Technically, plain DOS only needs around 24 bytes for an interrupt 21h call, but there might be TSRs that need more.) I might add that "programend" is a regular label and can be named however you like; it's not a specific NASM directive. Besides, the code could be changed so that there would instead be an equ like this:

Code: [Select]
org 100h

mov ax, 4C00h
cmp sp, stackbottom
jae okay
mov al, 0FFh
okay:
int 21h

; All code and data of the program needs to be in front of this equ, even uninitialized data (BSS).
stackbottom: equ $+64

The advantage here is that the computation (and stack requirement guess) are written in one place but the label stackbottom can be used multiple times in the program.

These examples of course only work for simple programs. More advanced programs might change their stack or move their own code and data around so this simple check wouldn't work correctly.

With DOS .EXE executables, the .EXE header (which is primarily created by the linker) specifies how much stack will be available and how it is addressed, so that the exact code for .COM programs won't work there. Instead, you need to know what the lowest acceptable stack offset (value for sp) would be in the stack addressing scheme utilized by your linker, or you'll have to change to a new stack in the running program (so that your program doesn't depend on the linker or linker options).

With non-DOS operating systems the specifics change, but if you do know the lowest acceptable value for esp you can still check whether the stack is near an overflow by doing the same comparison. (In 32-bit programs, esp is used, while in 16-bit (including DOS) programs it's just sp.)
C. Masloch