Author Topic: Calculate length of a command line input... (Read 39319 times)

pikestar · « **on:** June 12, 2011, 04:52:59 PM »

Hi just playing around with assembly for the first time. At the moment I'm trying to combine the hello world program in an assembly guide with Jonathan Leto's Writing A Useful Program With NASM. I want to get the program to echo the first parameter passed to it. The trouble is I'm not sure how to work out the length of the parameter with is needed for the system call sys_write in register edx (at the moment I've just stated 3).

Code: [Select]

section .data		;store all program data

section .text
	global _start	;tells linker where to start

_start:			;program starts here

	pop ebx
	pop ebx		;remove No args and program name from stack
	pop ecx		;put arument into ecx ready to output
	mov eax,4	;the system call for write
	mov ebx,1	;file descriptor for std output
	mov edx,3	;need length of paramater for edx
	int 80h		;call kernal

	mov eax,1	;sys call exit
	mov ebx,0	;return 0 i.e. no error
	int 80h		;call kernal

any help appricated

Frank Kotler · « **Reply #1 on:** June 12, 2011, 05:47:10 PM »

ecx will point to a zero-terminated string, so find the zero...

Code: [Select]

...
        pop ebx

; better make sure we've got one!
    cmp ebx, 2
    jb usage  ; print a usage message and exit?

	pop ebx		;remove No args and program name from stack
	pop ecx		;put arument into ecx ready to output

; get length in edx
    xor edx, edx
getlen:
    cmp byte [ecx + edx], 0
    jz gotlen
    inc edx
    jmp getlen
gotlen:

	mov eax,4	;the system call for write
	mov ebx,1	;file descriptor for std output
	int 80h		;call kernal	int 80h		;call kernal
...

Or some such...

Best,
Frank

pikestar · « **Reply #2 on:** June 12, 2011, 06:02:09 PM »

Got it. Simply when you think about it. I suppose still feeling a bit cautious of assembly so not sparking as I should! Need to get out of the habit of relying on ready made functions.

JoeCoder · « **Reply #3 on:** June 12, 2011, 06:07:44 PM »

It seems ironic to me that UNIX is based on null terminated strings everywhere except where they would actually be nice to use, like when calling their kernel functions. Why should we have to pass a length? Go look for the null, ya buttheads!

pikestar · « **Reply #4 on:** June 12, 2011, 06:32:20 PM »

Quick question looking at the code.

Quote from: Frank Kotler

; get length in edx
xor edx, edx
getlen:
cmp byte [ecx + edx], 0
jz gotlen
inc edx
jmp getlen
gotlen:

Is the use of the XOR just to ensure edx is set to zero (i.e. the start of the string) or is it for something else? If it is to set it to zero why use it instead of move edx,0?
sorry for the stupid questions!

JoeCoder · « **Reply #5 on:** June 12, 2011, 07:00:30 PM »

I read in Duntemann's book xor used to be faster than mov with an immediate operand. He implies things changed but who knows. There are a few other ways to zero a register like subtracting it from itself (don't know if that's a good one or even exists in x86) etc.

Mathi · « **Reply #6 on:** June 12, 2011, 07:33:42 PM »

xor edx, edx

Actually takes less opcodes ( 2 bytes i guess)

as opposed to mov edx, 0 ( 6 bytes i guess )

So the executable size will be smaller.

(mov is faster than xor , but i too use xor reg,reg to initialize a reg. to zero

A general practice... if you had stumbled across some size optimization tutorials )

Frank Kotler · « **Reply #7 on:** June 12, 2011, 07:38:10 PM »

Yeah, "xor edx, edx" is just to zero edx. Use "mov edx, 0" if it's clearer to you - "sub edx, edx" would also work. Programmer's choice! (isn't it nice being the programmer?)

Quote

Why should we have to pass a length? Go look for the null, ya buttheads!

Why should we have to go looking for the null, when the length is known in most cases (not this one)? Zero-terminated strings are actually kind of a dumb data structure (IMO). Length-prefixed strings (a la Pascal) are usually more efficient! (programmer's choice again, except where we have to interface with C).

Best,
Frank

JoeCoder · « **Reply #8 on:** June 12, 2011, 07:54:48 PM »

Quote from: Frank Kotler on June 12, 2011, 07:38:10 PM

Yeah, "xor edx, edx" is just to zero edx. Use "mov edx, 0" if it's clearer to you - "sub edx, edx" would also work. Programmer's choice! (isn't it nice being the programmer?)

We use a logical (unsigned) subtract where I come from. Is there one of those on x86 and is it faster than a regular sub?

Quote from: Frank Kotler on June 12, 2011, 07:38:10 PM

Why should we have to go looking for the null, when the length is known in most cases (not this one)? Zero-terminated strings are actually kind of a dumb data structure (IMO). Length-prefixed strings (a la Pascal) are usually more efficient! (programmer's choice again, except where we have to interface with C).

That's what I was saying, the irony. It seems ironic to me that UNIX is based on null terminated strings everywhere except where they would actually be nice to use, like when calling their kernel functions. Virtually everywhere in NIX they expect null terminated strings because of C. The one time it would have helped so we didn't have to figure a length, they pull out the rug from under us and expect us to give them a length. Why I outta!

PL/I had length prefixed strings way before PASCAL. It still does and the string code is real fast because of it, no scanning and no buffer overflows because the compiler knows the limits and will not move more than a maximal number of bytes to a target. I believe COBOL does the same thing, but I will have to double check later. I am talking about the IBM mainframe versions, I have no idea what other implementations do.

Bryant Keller · « **Reply #9 on:** June 13, 2011, 07:30:52 AM »

Quote from: JoeCoder on June 12, 2011, 06:07:44 PM

It seems ironic to me that UNIX is based on null terminated strings everywhere except where they would actually be nice to use, like when calling their kernel functions. Why should we have to pass a length? Go look for the null, ya buttheads!

Actually, sys_write & sys_read are BINARY input/output routines, yes they are used to work with text on the console via stdin,stdout, stderr file descriptors, but keep in mind that you might not always be using text. Say, for example, you're wanting to read the contents of a bitmap image into memory. In such a case you really don't want your sys_read routine to terminate at every null. Same can be said for when you want to dump that bitmap image back to disk.

Quote from: Frank Kotler on June 12, 2011, 07:38:10 PM

Why should we have to go looking for the null, when the length is known in most cases (not this one)? Zero-terminated strings are actually kind of a dumb data structure (IMO). Length-prefixed strings (a la Pascal) are usually more efficient! (programmer's choice again, except where we have to interface with C).

I completely agree about having the length of the string with the string itself. That is one of the few places where I tend to be "wasteful" and actually deal with string references rather than strings themselves.

Code: (strdemo.asm) [Select]

; build with : nasm -f elf strdemo.asm && gcc -nostartfiles -nostdlib strdemo.o -o strdemo

bits 32

__NR_exit equ 1
__NR_write equ 4

STDOUT equ 1

section .rodata
; Our DB Array's and size equates
dbaHello: DB "Hello, World!", 10, 0
dbaHello_size EQU ($-dbaHello-1)

dbaBye: DB "Goodbye, World!", 10, 0
dbaBye_size EQU ($-dbaBye-1)

section .data
; "String" structures
Hello:
.length DD dbaHello_size
.string DD dbaHello

Bye:
.length DD dbaBye_size
.string DD dbaBye

section .text
 global _start
_start:
 mov edx, [Hello.length]
 mov ecx, [Hello.string]
 mov ebx, STDOUT
 mov eax, __NR_write
 int 0x80

 mov edx, [Bye.length]
 mov ecx, [Bye.string]
 mov ebx, STDOUT
 mov eax, __NR_write
 int 0x80

 xor ebx, ebx
 mov eax, __NR_exit
 int 0x80

In the above example I added use $-label-1 because the length is 1 byte (the null terminator) less than the full ASCIIZ string. I still put the null terminator in strings simply cause I might someday use that string with a C function and as a good forward thinking programmer I don't want the possible C functions to puke on me.

I started doing this when I was coding under windows and I wanted to keep all my strings themselves in the CONST section, later I just adapted the same practice to the .rodata section. It's mostly useful for dynamic strings where you might need to resize the allocated space and you can set .length to contain the max space allocated for that string (if reached just get more memory). Personally though, I think it's a good trade off, yeah I may be using up more space but I'll never have to do a null byte search.

JoeCoder · « **Reply #10 on:** June 13, 2011, 08:39:27 AM »

Ooops, yeah I was thinking only of text. You are right, that won't work for binary data. I have to keep telling myself, in NIX everything is a file.

Thanks :-)

NASM - The Netwide Assembler

News:

Author Topic: Calculate length of a command line input... (Read 39319 times)

pikestar

Calculate length of a command line input...

Frank Kotler

Re: Calculate length of a command line input...

pikestar

Re: Calculate length of a command line input...

JoeCoder

Re: Calculate length of a command line input...

pikestar

Re: Calculate length of a command line input...

JoeCoder

Re: Calculate length of a command line input...

Mathi

Re: Calculate length of a command line input...

Frank Kotler

Re: Calculate length of a command line input...

JoeCoder

Re: Calculate length of a command line input...

Bryant Keller

Re: Calculate length of a command line input...

JoeCoder

Re: Calculate length of a command line input...