NASM - The Netwide Assembler
NASM Forum => Example Code => Topic started by: MediocreVeg1 on January 13, 2021, 06:32:21 AM
-
So I've been having a bit of difficulty regarding procedures, and the people on this forum helped a lot (and helped me understand 64-bit assembly in general). So in return, I want to help others who may be having difficulty like me too with the `puts' procedure i made. But the main advantage of the procedure - not having to enter the length - requires another procedure `strlen', which calculates the length of the string (not including the terminating 0 character). It takes rdi (which should hold the string) as a parameter and returns the length in rax. So here it is:
strlen:
xor rax, rax ; Return value, will count string length
dec rdi
.cntloop:
inc rax
inc rdi
cmp byte [rdi], 0 ; Terminating character
jnz strlen.cntloop
dec rax ; So that it does not include the last terminating character
ret
Thanks to this, the actual printing procedure is really simple. It also takes rdi as a parameter and prints the string in the register:
puts:
mov rsi, rdi
call strlen
mov rdx, rax
mov rax, 1
mov rdi, rax
syscall
ret
Note that the string doesn't need to have a 0 character at the end for this to work, but it would be better if you did put it.
This isn't as advanced as the `puts' macro I made, which didn't require the argument to be a variable and could take an infinite amount of arguments, but it couldn't print variables of .bss and too macros is probably a bad idea. Anyway, do give me any suggestions to improve the code if you feel it could be better.
-
UPDATE: I realised an issue with the procedure related to the way section .data works. From what I can tell, if you do this:
achar db "H"
bchar db "F"
the variables will be stored in the memory side-by-side with no space in-between, so no terminating character in-between. This means that if you print achar, it'll print bchar right after that. Due to this, I HIGHLY RECOMMEND YOU PUT THE NULL CHARACTER YOURSELF AT THE END OF EACH STRING. Apart from that, the rest is fine.
-
Looks as if you've reinvented "gets". No protection against overflow. Take it away!
Best,
Frank
-
What do you mean by that? I actually haven't tried to make a `gets' without specifying the size because I'm not sure if that is possible with how I'm getting length currently.
-
Oh right, if you are talking about the going to the other vairable thing, yeah, it is a bit annoying. If you really can't put the 0 at the end, you can override db to automatically put a 0 with this macro:
%macro db 1+
db %1, 0
%endmacro
(The plus will make it a greedy parameter so %1 will include all args)
if you don't want to override db or dont want to do this with every dx instruction you can rename it to something else too.
-
My Mistake. Sorry.
Best,
Frank
-
Oh no it's really fine, I agree that the thing can be quite annoying, no denying that. Perhaps I should make another procedure that could allow input for size (basically normal printing but a bit shorter to call). My strlen also seems a bit brute force-ish, looping through the whole string, so I might try to change the code for that too.
-
if you intend to mimic puts() function from libc, there are 2 things missing:
- it prints an extra '\n' at the end of the string;
- it returns the number of chars printed or -1 in case of error.
A better approach:
extern strlen
; int puts_( char * );
;
; puts() returns the number of chars writen or -1 if error.
;
; Entry: RDI points to string
align 16
puts_:
push rbx
mov rbx,rdi
; glibc strlen() is 100 times faster than that
; handmade routine.
call strlen wrt ..plt
mov edx,eax
mov rsi,rbx
mov ebx,edx ; save for later.
mov eax,1
mov edi,eax
syscall
test rax,rax
js .error
; puts() prints a final '\n'.
mov byte [rsp-8],`\n` ; Use the 'red-zone'.
mov eax,1
mov edi,eax
mov edx,eax
lea rsi,[rsp-8]
syscall
test eax,eax
js .error
lea eax,[rbx+1]
pop rbx
ret
.error:
mov eax,-1
pop rbx
ret
-
I actually didn't intend to mimic the function, but the extra features do seem helpful. I agree about my strlen being slow too, maybe I'll try a different approach with scasb or something similar. There are definitely some things I need to learn from this example (I tried so much to print a character without making a variable and now I hear about this "red zone"!) and I will try to improve on this. I also will probably want to make another procedure for the newline in case you want to print, for example, a greeting and a variable <name> on the same line. Java has a println() and print() and I've always found this helpful.
Also, do you know what kind of errors could arise from puts?
-
[UPDATE] I basically remade my strlen procedure and I think it should be faster now. here's the code:
strlen:
mov rcx, 0xff ; (A string may be larger than 255 characters, in which case this would have to increase
xor rbx, rbx
repnz scasb ; Will store in rbx
sub rdi, rbx
mov rax, rdi
dec rax ; So as not to include 0
ret
-
I'm trying to post this reply for the last 2 days.
Actually, my measurements were about a similar routine using rep/scasb against glibc's strlen().
-
The actual routine I've tested was this one:
bits 64
default rel
section .text
; size_t strlen_( char * );
global strlen_
strlen_:
xor eax,eax
mov ecx,-1 ; Limit size to 2³²-1 bytes long
repnz scasb
jnz .not_found
not ecx
dec ecx
mov eax,ecx
ret
.not_found:
mov rax,-1 ; Return maximum length if
; NUL char not found.
ret
-
Interesting, I'm not exatly sure how you've used not here (Probably way faster than my sub alternative), but I'll try to figure it out. I'll also incorporate your error handlng into my procedure. Thanks for the example!
As for the print statements, I did end up making up making separate puts and putsln procedures. Still not sure what kind of error could arise from them (I guess one could be that if strlen failed, it would return the same error code).
-
Interesting, I'm not exatly sure how you've used not here (Probably way faster than my sub alternative), but I'll try to figure it out.
Not necessarily 'faster', but since 2³²-1-len is the same as ~len, I just used this fact to calculate the string length. (DEC ECX because we're excluding the final NUL char).
I found strange your approach, since sub rdi,rbx (with rbx == 0) do nothing, except affect the flags. Din't you mean to use 'mov rbx,rdi' instead of 'xor rbx,rbx'?
PS: Try to use E?? registers as many times as possible. Using R?? will imply a REX prefix and bigger instructions. moving and doing arithmetic/logical operations to E?? registers will automatically zero the upper 32 bits of R?? registers for you (this doen't work in a few instructions as, for example, CDQ [will zero upper RDX but not upper RAX).
PS: Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
size_t size = strlen_( str );
if ( (long)size < 0 ) { ... handle error... }
-
I found strange your approach, since sub rdi,rbx (with rbx == 0) do nothing, except affect the flags. Din't you mean to use 'mov rbx,rdi' instead of 'xor rbx,rbx'?
Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly? That's why I cleared it with XOR before using scasb and why I substracted it from rdi (which has starting address of string). I'm probably wrong here though.
PS: Try to use E?? registers as many times as possible. Using R?? will imply a REX prefix and bigger instructions. moving and doing arithmetic/logical operations to E?? registers will automatically zero the upper 32 bits of R?? registers for you (this doen't work in a few instructions as, for example, CDQ [will zero upper RDX but not upper RAX).
Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?
S: Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
Yeah, I put it into my procedure as well after you showed your example. Wouldn't this event be highly unlikely though? I think 2^32-1 is like 4294967295 bytes so every single byte after the starting address of the string would have to be non-zero, right?
-
I found strange your approach, since sub rdi,rbx (with rbx == 0) do nothing, except affect the flags. Din't you mean to use 'mov rbx,rdi' instead of 'xor rbx,rbx'?
Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly? That's why I cleared it with XOR before using scasb and why I substracted it from rdi (which has starting address of string). I'm probably wrong here though.
Nope... repnz scasb searches for AL in RDI and forward (increasing RDI and decreasing RCX). It won't change RBX.
PS: Try to use E?? registers as many times as possible. Using R?? will imply a REX prefix and bigger instructions. moving and doing arithmetic/logical operations to E?? registers will automatically zero the upper 32 bits of R?? registers for you (this doen't work in a few instructions as, for example, CDQ [will zero upper RDX but not upper RAX).
Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?
It won't get confused because E?? registers are part of R?? registers. When you use a R?? register NASM is forced to add a prefix to the instruction (REX prefix). This prefix isn't added if you use E?? registers. Of course, if the argument is an address (a pointer) you are forced to use R??.
S: Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
Yeah, I put it into my procedure as well after you showed your example. Wouldn't this event be highly unlikely though? I think 2^32-1 is like 4294967295 bytes so every single byte after the starting address of the string would have to be non-zero, right?
Yep... The problem when you ser RCX to -1 is that, theoretically you can search a 18446744073709551615 bytes long string. If such string exists (not possible), than REP SCASB would take 192 YEARS to complete (@ 3 GHz). If you restrict the string to 2³²-1 bytes, if such string exists (improbable), it would take only 1.5 seconds to scan.
-
Nope... repnz scasb searches for AL in RDI and forward (increasing RDI and decreasing RCX). It won't change RBX.
Oh man, I'm so stupid. How does my procedure even work then? Sheesh, now I'm even confused as to how my own procedure works :P
It won't get confused because E?? registers are part of R?? registers. When you use a R?? register NASM is forced to add a prefix to the instruction (REX prefix). This prefix isn't added if you use E?? registers. Of course, if the argument is an address (a pointer) you are forced to use R??
Oh, I see. Would that mean that I would be able to enter the call number in eax too? And for example in the second argument of a system call, would I use ebx or edi for 32-bit registers?
Yep... The problem when you ser RCX to -1 is that, theoretically you can search a 18446744073709551615 bytes long string. If such string exists (not possible), than REP SCASB would take 192 YEARS to complete (@ 3 GHz). If you restrict the string to 2³²-1 bytes, if such string exists (improbable), it would take only 1.5 seconds to scan.
Wow, that's... a lot. Anyway, I think this answers my question. Thanks.
-
Oh man, I'm so stupid. How does my procedure even work then? Sheesh, now I'm even confused as to how my own procedure works :P
I confesss I cannot see how it works, too... ;)
Oh, I see. Would that mean that I would be able to enter the call number in eax too? And for example in the second argument of a system call, would I use ebx or edi for 32-bit registers?
A simple example. If you want to print a string you can do:
bits 64
default rel ; Use RIP relative addressing.
section .rodata
msg: db `Hello\n`
msg_len equ $ - msg
section .text
global writemsg
writemsg:
mov eax,1 ; EAX, instead of RAX will zero upper 32 bits.
mov edi,eax ; RDI=STDOUT_FILENO (upper 32 bits zeroed).
lea rsi,[msg] ; Need to load RSI (because it is a pointer). LEA because 'msg' is RIP relative.
mov edx,msg_len ; RDX (upper 32 bits zeroed).
syscall
ret
Those E?? movs are shorter than if you use R?? regs.
[]s
Fred
-
I confesss I cannot see how it works, too... ;)
Yeah, I think I'll change my thing to your not solution so I at least understand what is going on.
And about the registers, I think I finally get it now. Thanks for all the help!
-
Just getting my feed wet with 64 bits (yes, my very first code) :D. If I understand correctly, there is some optimization to be gained from using 32 bit registers with the string length (which would probably never exceed 4GB).
As usual, no C externals here. ;D
; nasm -f elf64 puts.asm
; ld -m elf_x86_64 puts.o -o puts
bits 64
section .text
global _start
_start:
mov rdi, msg ; string
call strlen ; length
test rax, rax ; anyhing?
jz .__out
call puts ; show it
.__out:
mov rax, 0x3c
xor rdi, rdi
syscall
puts:
mov rsi, rdi ; string
mov rdi, 1 ; stdout
mov rdx, rax ; length (from strlen)
mov rax, 1 ; write
syscall
ret
strlen:
push rdi ; save address
sub rcx, rcx
not rcx ; rcx -1
xor eax, eax
cld ; count forward
repne scasb
not rcx
lea rax, [rcx - 1] ; length
pop rdi
ret
section .data
msg db "Hello world!", 10, 0
-
Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly? That's why I cleared it with XOR before using scasb and why I substracted it from rdi (which has starting address of string). I'm probably wrong here though.
SCASB reads from ES:RDI and compares with AL, affecting the flags, and updates RDI. With REP (or REPNZ) prefix it does RCX times while ZF=0 (hence the NZ). So strlen could be implemented as:
; Same as: size_t strlen( const char * );
; the function assumes ALL strings will be NUL terminated.
strlen_:
xor eax,eax
lea ecx,[rax-1] ; Limiting the string size to 2³²-1, max.
mov rdx,rdi
repnz scasb ; Scan for '\0'...
sub rdi,rdx
mov rax,rdi ; returns size in RAX.
ret
Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?
E?? registers are the lower part of R?? registers. And, in x86-64 mode, when you change E?? register the upper 32 bits of R?? register is automatically zeroed... Instructions using R?? registers need to insert an prefix (REX prefix), with E?? no prefix...
Wouldn't the assembler Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
... Wouldn't this event be highly unlikely though? ...
You are right!... It easier to assume the routine expects ALL strings to be zero terminated...
-
So strlen could be implemented as:
; Same as: size_t strlen( const char * );
; the function assumes ALL strings will be NUL terminated.
strlen_:
xor eax,eax
lea ecx,[rax-1] ; Limiting the string size to 2³²-1, max.
mov rdx,rdi
repnz scasb ; Scan for '\0'...
sub rdi,rdx
mov rax,rdi ; returns size in RAX.
ret
That would return string length + 1. So alternatively:
xor eax, eax
lea ecx, [rax - 1]
mov rdx, rdi
repnz scasb
sub rdi, rdx
;mov rax, rdi
lea rax, [rdi - 1] ; not counting the null terminator
-
That would return string length + 1....
Ops... sorry... my bad...
-
you can just iterate displaying letters until it's \0..
so you can display a 1 gigabyte or more worth of strings and then cut it when it's \0.
-
Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly?
Nope! scasb tests AL against ES:[RDI], setting the flags. repnz scasb does the same, RCX times while ZF=0.
Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?
Nope! E?? registers are the lower 32 bits of R?? registers. When you use a R?? register the instrunction is prefixed with a REX prefix (and, an immediate can be bigger), like, for example:
mov eax,-1 ; B8 FF FF FF FF
mov rax,-1 ; 48 B8 FF FF FF FF FF FF FF FF
When using EAX the upper 32 bits are automagically (hehe) zeroed.
Yeah, I put it into my procedure as well after you showed your example. Wouldn't this event be highly unlikely though? I think 2^32-1 is like 4294967295 bytes so every single byte after the starting address of the string would have to be non-zero, right?
That's why it doesn't make sense using R?? registers to hold string lengths...
[]s
Fred