Author Topic: Calculate length of a command line input...  (Read 18548 times)

Offline pikestar

  • Jr. Member
  • *
  • Posts: 5
Calculate length of a command line input...
« on: June 12, 2011, 04:52:59 PM »
Hi just playing around with assembly for the first time. At the moment I'm trying to combine the hello world program in  an assembly guide with  Jonathan Leto's Writing A Useful Program With NASM. I want to get the program to echo the first parameter passed to it. The trouble is I'm not sure how to work out the length of the parameter with is needed for the system call sys_write in register edx (at the moment I've just stated 3).

Code: [Select]
section .data ;store all program data

section .text
global _start ;tells linker where to start

_start: ;program starts here

pop ebx
pop ebx ;remove No args and program name from stack
pop ecx ;put arument into ecx ready to output
mov eax,4 ;the system call for write
mov ebx,1 ;file descriptor for std output
mov edx,3 ;need length of paramater for edx
int 80h ;call kernal

mov eax,1 ;sys call exit
mov ebx,0 ;return 0 i.e. no error
int 80h ;call kernal


any help appricated

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Calculate length of a command line input...
« Reply #1 on: June 12, 2011, 05:47:10 PM »
ecx will point to a zero-terminated string, so find the zero...

Code: [Select]
...
        pop ebx

; better make sure we've got one!
    cmp ebx, 2
    jb usage  ; print a usage message and exit?

pop ebx ;remove No args and program name from stack
pop ecx ;put arument into ecx ready to output

; get length in edx
    xor edx, edx
getlen:
    cmp byte [ecx + edx], 0
    jz gotlen
    inc edx
    jmp getlen
gotlen:

mov eax,4 ;the system call for write
mov ebx,1 ;file descriptor for std output
int 80h ;call kernal int 80h ;call kernal
...

Or some such...

Best,
Frank


Offline pikestar

  • Jr. Member
  • *
  • Posts: 5
Re: Calculate length of a command line input...
« Reply #2 on: June 12, 2011, 06:02:09 PM »
Got it. Simply when you think about it. I suppose still feeling a bit cautious of assembly so not sparking as I should! Need to get out of the habit of relying on ready made functions. :D

Offline JoeCoder

  • Jr. Member
  • *
  • Posts: 72
  • Country: by
Re: Calculate length of a command line input...
« Reply #3 on: June 12, 2011, 06:07:44 PM »
It seems ironic to me that UNIX is based on null terminated strings everywhere except where they would actually be nice to use, like when calling their kernel functions. Why should we have to pass a length? Go look for the null, ya buttheads!  :D  :o  >:(
If you can't code it in assembly, it can't be coded!

Offline pikestar

  • Jr. Member
  • *
  • Posts: 5
Re: Calculate length of a command line input...
« Reply #4 on: June 12, 2011, 06:32:20 PM »
Quick question looking at the code.

Quote from: Frank Kotler
; get length in edx
    xor edx, edx
getlen:
    cmp byte [ecx + edx], 0
    jz gotlen
    inc edx
    jmp getlen
gotlen:

Is the use of the XOR just to ensure edx is set to zero (i.e. the start of the string) or is it for something else? If it is to set it to zero why use it instead of move edx,0?
sorry for the stupid questions!

Offline JoeCoder

  • Jr. Member
  • *
  • Posts: 72
  • Country: by
Re: Calculate length of a command line input...
« Reply #5 on: June 12, 2011, 07:00:30 PM »
I read in Duntemann's book xor used to be faster than mov with an immediate operand. He implies things changed but who knows. There are a few other ways to zero a register like subtracting it from itself (don't know if that's a good one or even exists in x86) etc.
If you can't code it in assembly, it can't be coded!

Offline Mathi

  • Jr. Member
  • *
  • Posts: 82
  • Country: in
    • Win32NASM
Re: Calculate length of a command line input...
« Reply #6 on: June 12, 2011, 07:33:42 PM »
xor edx, edx 

Actually takes less opcodes ( 2 bytes i guess)

as opposed to mov edx, 0  ( 6 bytes i guess )

So the executable size will be smaller.

(mov is faster than xor , but i too use xor  reg,reg  to initialize a reg. to zero  :)
A general practice... if you had stumbled across some size optimization tutorials )

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Calculate length of a command line input...
« Reply #7 on: June 12, 2011, 07:38:10 PM »
Yeah, "xor edx, edx" is just to zero edx. Use "mov edx, 0" if it's clearer to you - "sub edx, edx" would also work. Programmer's choice! (isn't it nice being the programmer?)

Quote
Why should we have to pass a length? Go look for the null, ya buttheads!

Why should we have to go looking for the null, when the length is known in most cases (not this one)? Zero-terminated strings are actually kind of a dumb data structure (IMO). Length-prefixed strings (a la Pascal) are usually more efficient! (programmer's choice again, except where we have to interface with C).

Best,
Frank


Offline JoeCoder

  • Jr. Member
  • *
  • Posts: 72
  • Country: by
Re: Calculate length of a command line input...
« Reply #8 on: June 12, 2011, 07:54:48 PM »
Yeah, "xor edx, edx" is just to zero edx. Use "mov edx, 0" if it's clearer to you - "sub edx, edx" would also work. Programmer's choice! (isn't it nice being the programmer?)

We use a logical (unsigned) subtract where I come from. Is there one of those on x86 and is it faster than a regular sub?

Why should we have to go looking for the null, when the length is known in most cases (not this one)? Zero-terminated strings are actually kind of a dumb data structure (IMO). Length-prefixed strings (a la Pascal) are usually more efficient! (programmer's choice again, except where we have to interface with C).

That's what I was saying, the irony. It seems ironic to me that UNIX is based on null terminated strings everywhere except where they would actually be nice to use, like when calling their kernel functions. Virtually everywhere in NIX they expect null terminated strings because of C. The one time it would have helped so we didn't have to figure a length, they pull out the rug from under us and expect us to give them a length. Why I outta!

PL/I had length prefixed strings way before PASCAL. It still does and the string code is real fast because of it, no scanning and no buffer overflows because the compiler knows the limits and will not move more than a maximal number of bytes to a target. I believe COBOL does the same thing, but I will have to double check later. I am talking about the IBM mainframe versions, I have no idea what other implementations do.
« Last Edit: June 12, 2011, 07:57:13 PM by JoeCoder »
If you can't code it in assembly, it can't be coded!

Offline Bryant Keller

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 360
  • Country: us
    • About Bryant Keller
Re: Calculate length of a command line input...
« Reply #9 on: June 13, 2011, 07:30:52 AM »
It seems ironic to me that UNIX is based on null terminated strings everywhere except where they would actually be nice to use, like when calling their kernel functions. Why should we have to pass a length? Go look for the null, ya buttheads!  :D  :o  >:(

Actually, sys_write & sys_read are BINARY input/output routines, yes they are used to work with text on the console via stdin,stdout, stderr file descriptors, but keep in mind that you might not always be using text. Say, for example, you're wanting to read the contents of a bitmap image into memory. In such a case you really don't want your sys_read routine to terminate at every null. Same can be said for when you want to dump that bitmap image back to disk.

Why should we have to go looking for the null, when the length is known in most cases (not this one)? Zero-terminated strings are actually kind of a dumb data structure (IMO). Length-prefixed strings (a la Pascal) are usually more efficient! (programmer's choice again, except where we have to interface with C).

I completely agree about having the length of the string with the string itself. That is one of the few places where I tend to be "wasteful" and actually deal with string references rather than strings themselves.

Code: (strdemo.asm) [Select]
; build with : nasm -f elf strdemo.asm && gcc -nostartfiles -nostdlib strdemo.o -o strdemo

bits 32

__NR_exit equ 1
__NR_write equ 4

STDOUT equ 1

section .rodata
; Our DB Array's and size equates
dbaHello: DB "Hello, World!", 10, 0
dbaHello_size EQU ($-dbaHello-1)

dbaBye: DB "Goodbye, World!", 10, 0
dbaBye_size EQU ($-dbaBye-1)

section .data
; "String" structures
Hello:
.length DD dbaHello_size
.string DD dbaHello

Bye:
.length DD dbaBye_size
.string DD dbaBye

section .text
 global _start
_start:
 mov edx, [Hello.length]
 mov ecx, [Hello.string]
 mov ebx, STDOUT
 mov eax, __NR_write
 int 0x80

 mov edx, [Bye.length]
 mov ecx, [Bye.string]
 mov ebx, STDOUT
 mov eax, __NR_write
 int 0x80

 xor ebx, ebx
 mov eax, __NR_exit
 int 0x80

In the above example I added use $-label-1 because the length is 1 byte (the null terminator) less than the full ASCIIZ string. I still put the null terminator in strings simply cause I might someday use that string with a C function and as a good forward thinking programmer I don't want the possible C functions to puke on me. :D

I started doing this when I was coding under windows and I wanted to keep all my strings themselves in the CONST section, later I just adapted the same practice to the .rodata section. It's mostly useful for dynamic strings where you might need to resize the allocated space and you can set .length to contain the max space allocated for that string (if reached just get more memory). Personally though, I think it's a good trade off, yeah I may be using up more space but I'll never have to do a null byte search. :P

About Bryant Keller
bkeller@about.me

Offline JoeCoder

  • Jr. Member
  • *
  • Posts: 72
  • Country: by
Re: Calculate length of a command line input...
« Reply #10 on: June 13, 2011, 08:39:27 AM »
Ooops, yeah I was thinking only of text. You are right, that won't work for binary data. I have to keep telling myself, in NIX everything is a file.

Thanks :-)
If you can't code it in assembly, it can't be coded!