NASM - The Netwide Assembler

NASM Forum => Using NASM => Topic started by: pikestar on June 12, 2011, 04:52:59 PM

Title: Calculate length of a command line input...
Post by: pikestar on June 12, 2011, 04:52:59 PM: Hi just playing around with assembly for the first time. At the moment I'm trying to combine the hello world program in an assembly guide (http://"http://asm.sourceforge.net/intro/hello.html") with Jonathan Leto's Writing A Useful Program With NASM (http://"http://leto.net/writing/nasm.php"). I want to get the program to echo the first parameter passed to it. The trouble is I'm not sure how to work out the length of the parameter with is needed for the system call sys_write in register edx (at the moment I've just stated 3).

Code: [Select]
section .data ;store all program data section .text global _start ;tells linker where to start _start: ;program starts here pop ebx pop ebx ;remove No args and program name from stack pop ecx ;put arument into ecx ready to output mov eax,4 ;the system call for write mov ebx,1 ;file descriptor for std output mov edx,3 ;need length of paramater for edx int 80h ;call kernal mov eax,1 ;sys call exit mov ebx,0 ;return 0 i.e. no error int 80h ;call kernal

any help appricated
Title: Re: Calculate length of a command line input...
Post by: Frank Kotler on June 12, 2011, 05:47:10 PM: ecx will point to a zero-terminated string, so find the zero...

Code: [Select]
... pop ebx ; better make sure we've got one! cmp ebx, 2 jb usage ; print a usage message and exit? pop ebx ;remove No args and program name from stack pop ecx ;put arument into ecx ready to output ; get length in edx xor edx, edx getlen: cmp byte [ecx + edx], 0 jz gotlen inc edx jmp getlen gotlen: mov eax,4 ;the system call for write mov ebx,1 ;file descriptor for std output int 80h ;call kernal int 80h ;call kernal ...
Or some such...

Best,
Frank
Title: Re: Calculate length of a command line input...
Post by: pikestar on June 12, 2011, 06:02:09 PM: Got it. Simply when you think about it. I suppose still feeling a bit cautious of assembly so not sparking as I should! Need to get out of the habit of relying on ready made functions. :D
Title: Re: Calculate length of a command line input...
Post by: JoeCoder on June 12, 2011, 06:07:44 PM: It seems ironic to me that UNIX is based on null terminated strings everywhere except where they would actually be nice to use, like when calling their kernel functions. Why should we have to pass a length? Go look for the null, ya buttheads! :D :o >:(
Title: Re: Calculate length of a command line input...
Post by: pikestar on June 12, 2011, 06:32:20 PM: Quick question looking at the code.

Quote from: Frank Kotler
; get length in edx
xor edx, edx
getlen:
cmp byte [ecx + edx], 0
jz gotlen
inc edx
jmp getlen
gotlen:

Is the use of the XOR just to ensure edx is set to zero (i.e. the start of the string) or is it for something else? If it is to set it to zero why use it instead of move edx,0?
sorry for the stupid questions!
Title: Re: Calculate length of a command line input...
Post by: JoeCoder on June 12, 2011, 07:00:30 PM: I read in Duntemann's book xor used to be faster than mov with an immediate operand. He implies things changed but who knows. There are a few other ways to zero a register like subtracting it from itself (don't know if that's a good one or even exists in x86) etc.
Title: Re: Calculate length of a command line input...
Post by: Mathi on June 12, 2011, 07:33:42 PM: xor edx, edx

Actually takes less opcodes ( 2 bytes i guess)

as opposed to mov edx, 0 ( 6 bytes i guess )

So the executable size will be smaller.

(mov is faster than xor , but i too use xor reg,reg to initialize a reg. to zero :)
A general practice... if you had stumbled across some size optimization tutorials )
Title: Re: Calculate length of a command line input...
Post by: Frank Kotler on June 12, 2011, 07:38:10 PM: Yeah, "xor edx, edx" is just to zero edx. Use "mov edx, 0" if it's clearer to you - "sub edx, edx" would also work. Programmer's choice! (isn't it nice being the programmer?)

Quote
Why should we have to pass a length? Go look for the null, ya buttheads!

Why should we have to go looking for the null, when the length is known in most cases (not this one)? Zero-terminated strings are actually kind of a dumb data structure (IMO). Length-prefixed strings (a la Pascal) are usually more efficient! (programmer's choice again, except where we have to interface with C).

Best,
Frank
Title: Re: Calculate length of a command line input...
Post by: JoeCoder on June 12, 2011, 07:54:48 PM: Quote from: Frank Kotler on June 12, 2011, 07:38:10 PM
Yeah, "xor edx, edx" is just to zero edx. Use "mov edx, 0" if it's clearer to you - "sub edx, edx" would also work. Programmer's choice! (isn't it nice being the programmer?)

We use a logical (unsigned) subtract where I come from. Is there one of those on x86 and is it faster than a regular sub?

Quote from: Frank Kotler on June 12, 2011, 07:38:10 PM
Why should we have to go looking for the null, when the length is known in most cases (not this one)? Zero-terminated strings are actually kind of a dumb data structure (IMO). Length-prefixed strings (a la Pascal) are usually more efficient! (programmer's choice again, except where we have to interface with C).

That's what I was saying, the irony. It seems ironic to me that UNIX is based on null terminated strings everywhere except where they would actually be nice to use, like when calling their kernel functions. Virtually everywhere in NIX they expect null terminated strings because of C. The one time it would have helped so we didn't have to figure a length, they pull out the rug from under us and expect us to give them a length. Why I outta!

PL/I had length prefixed strings way before PASCAL. It still does and the string code is real fast because of it, no scanning and no buffer overflows because the compiler knows the limits and will not move more than a maximal number of bytes to a target. I believe COBOL does the same thing, but I will have to double check later. I am talking about the IBM mainframe versions, I have no idea what other implementations do.
Title: Re: Calculate length of a command line input...
Post by: Bryant Keller on June 13, 2011, 07:30:52 AM: Quote from: JoeCoder on June 12, 2011, 06:07:44 PM
It seems ironic to me that UNIX is based on null terminated strings everywhere except where they would actually be nice to use, like when calling their kernel functions. Why should we have to pass a length? Go look for the null, ya buttheads! :D :o >:(

Actually, sys_write & sys_read are BINARY input/output routines, yes they are used to work with text on the console via stdin,stdout, stderr file descriptors, but keep in mind that you might not always be using text. Say, for example, you're wanting to read the contents of a bitmap image into memory. In such a case you really don't want your sys_read routine to terminate at every null. Same can be said for when you want to dump that bitmap image back to disk.

Quote from: Frank Kotler on June 12, 2011, 07:38:10 PM
Why should we have to go looking for the null, when the length is known in most cases (not this one)? Zero-terminated strings are actually kind of a dumb data structure (IMO). Length-prefixed strings (a la Pascal) are usually more efficient! (programmer's choice again, except where we have to interface with C).

I completely agree about having the length of the string with the string itself. That is one of the few places where I tend to be "wasteful" and actually deal with string references rather than strings themselves.

Code: (strdemo.asm) [Select]
; build with : nasm -f elf strdemo.asm && gcc -nostartfiles -nostdlib strdemo.o -o strdemo bits 32 __NR_exit equ 1 __NR_write equ 4 STDOUT equ 1 section .rodata ; Our DB Array's and size equates dbaHello: DB "Hello, World!", 10, 0 dbaHello_size EQU ($-dbaHello-1) dbaBye: DB "Goodbye, World!", 10, 0 dbaBye_size EQU ($-dbaBye-1) section .data ; "String" structures Hello: .length DD dbaHello_size .string DD dbaHello Bye: .length DD dbaBye_size .string DD dbaBye section .text global _start _start: mov edx, [Hello.length] mov ecx, [Hello.string] mov ebx, STDOUT mov eax, __NR_write int 0x80 mov edx, [Bye.length] mov ecx, [Bye.string] mov ebx, STDOUT mov eax, __NR_write int 0x80 xor ebx, ebx mov eax, __NR_exit int 0x80
In the above example I added use $-label-1 because the length is 1 byte (the null terminator) less than the full ASCIIZ string. I still put the null terminator in strings simply cause I might someday use that string with a C function and as a good forward thinking programmer I don't want the possible C functions to puke on me. :D

I started doing this when I was coding under windows and I wanted to keep all my strings themselves in the CONST section, later I just adapted the same practice to the .rodata section. It's mostly useful for dynamic strings where you might need to resize the allocated space and you can set .length to contain the max space allocated for that string (if reached just get more memory). Personally though, I think it's a good trade off, yeah I may be using up more space but I'll never have to do a null byte search. :P
Title: Re: Calculate length of a command line input...
Post by: JoeCoder on June 13, 2011, 08:39:27 AM: Ooops, yeah I was thinking only of text. You are right, that won't work for binary data. I have to keep telling myself, in NIX everything is a file.

Thanks :-)