NASM - The Netwide Assembler

NASM Forum => Example Code => Topic started by: MediocreVeg1 on January 13, 2021, 06:32:21 AM

Title: My own 64-bit `puts' instruction (No length required)
Post by: MediocreVeg1 on January 13, 2021, 06:32:21 AM
So I've been having a bit of difficulty regarding procedures, and the people on this forum helped a lot (and helped me understand 64-bit assembly in general). So in return, I want to help others who may be having difficulty like me too with the `puts' procedure i made. But the main advantage of the procedure - not having to enter the length - requires another procedure `strlen', which calculates the length of the string (not including the terminating 0 character). It takes rdi (which should hold the string) as a parameter and returns the length in rax. So here it is:
Code: [Select]
strlen:
    xor rax, rax ; Return value, will count string length
    dec rdi
.cntloop:
    inc rax
    inc rdi
    cmp byte [rdi], 0 ; Terminating character
    jnz strlen.cntloop

    dec rax ; So that it does not include the last terminating character
    ret
Thanks to this, the actual printing procedure is really simple. It also takes rdi as a parameter and prints the string in the register:
Code: [Select]
puts:
    mov rsi, rdi
    call strlen
    mov rdx, rax
    mov rax, 1
    mov rdi, rax
    syscall
    ret

Note that the string doesn't need to have a 0 character at the end for this to work, but it would be better if you did put it.

This isn't as advanced as the `puts' macro I made, which didn't require the argument to be a variable and could take an infinite amount of arguments, but it couldn't print variables of .bss and too macros is probably a bad idea. Anyway, do give me any suggestions to improve the code if you feel it could be better.
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: MediocreVeg1 on January 13, 2021, 07:47:26 AM
UPDATE: I realised an issue with the procedure related to the way section .data works. From what I can tell, if you do this:
Code: [Select]
achar db "H"
bchar db "F"
the variables will be stored in the memory side-by-side with no space in-between, so no terminating character in-between. This means that if you print achar, it'll print bchar right after that. Due to this, I HIGHLY RECOMMEND YOU PUT THE NULL CHARACTER YOURSELF AT THE END OF EACH STRING. Apart from that, the rest is fine.
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: Frank Kotler on January 13, 2021, 09:23:09 PM
Looks as if you've reinvented "gets". No protection against overflow.  Take it away!

Best,
Frank

Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: MediocreVeg1 on January 14, 2021, 06:50:31 AM
What do you mean by that? I actually haven't tried to make a `gets' without specifying the size because I'm not sure if that is possible with how I'm getting length currently.
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: MediocreVeg1 on January 14, 2021, 06:57:51 AM
Oh right, if you are talking about the going to the other vairable thing, yeah, it is a bit annoying. If you really can't put the 0 at the end, you can override db to automatically put a 0 with this macro:
Code: [Select]
%macro db 1+
    db %1, 0
%endmacro
(The plus will make it a greedy parameter so %1 will include all args)
if you don't want to override db or dont want to do this with every dx instruction you can rename it to something else too.
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: Frank Kotler on January 14, 2021, 10:33:51 PM
My Mistake. Sorry.

Best,
Frank

Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: MediocreVeg1 on January 15, 2021, 08:53:12 AM
Oh no it's really fine, I agree that the thing can be quite annoying, no denying that. Perhaps I should make another procedure that could allow input for size (basically normal printing but a bit shorter to call). My strlen also seems a bit brute force-ish, looping through the whole string, so I might try to change the code for that too.
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: fredericopissarra on January 15, 2021, 03:36:20 PM
if you intend to mimic puts() function from libc, there are 2 things missing:


A better approach:
Code: [Select]
  extern strlen

; int puts_( char * );
;
; puts() returns the number of chars writen or -1 if error.
;
; Entry: RDI points to string
  align 16
puts_:
  push  rbx

  mov   rbx,rdi

  ; glibc strlen() is 100 times faster than that
  ; handmade routine.
  call  strlen wrt ..plt
 
  mov   edx,eax
  mov   rsi,rbx
  mov   ebx,edx     ; save for later.
  mov   eax,1
  mov   edi,eax
  syscall

  test  rax,rax
  js    .error

  ; puts() prints a final '\n'.
  mov   byte [rsp-8],`\n`     ; Use the 'red-zone'.
  mov   eax,1
  mov   edi,eax
  mov   edx,eax
  lea   rsi,[rsp-8]
  syscall
 
  test  eax,eax
  js    .error

  lea   eax,[rbx+1]
  pop   rbx
  ret

.error:
  mov   eax,-1
  pop   rbx
  ret
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: MediocreVeg1 on January 15, 2021, 03:48:32 PM
I actually didn't intend to mimic the function, but the extra features do seem helpful. I agree about my strlen being slow too, maybe I'll try a different approach with scasb or something similar. There are definitely some things I need to learn from this example (I tried so much to print a character without making a variable and now I hear about this "red zone"!) and I will try to improve on this. I also will probably want to make another procedure for the newline in case you want to print, for example, a greeting and a variable <name> on the same line. Java has a println() and print() and I've always found this helpful.
Also, do you know what kind of errors could arise from puts?
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: MediocreVeg1 on January 15, 2021, 05:34:19 PM
[UPDATE] I basically remade my strlen procedure and I think it should be faster now. here's the code:
Code: [Select]
strlen:
    mov rcx, 0xff ; (A string may be larger than 255 characters, in which case this would have to increase
    xor rbx, rbx
    repnz scasb ; Will store in rbx
    sub rdi, rbx
    mov rax, rdi
    dec rax ; So as not to include 0
    ret
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: fredericopissarra on January 17, 2021, 03:36:53 PM
I'm trying to post this reply for the last 2 days.

Actually, my measurements were about a similar routine using rep/scasb against glibc's strlen().
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: fredericopissarra on January 17, 2021, 04:07:00 PM
The actual routine I've tested was this one:
Code: [Select]
  bits  64
  default rel

  section .text

; size_t strlen_( char * );
  global  strlen_
strlen_:
  xor   eax,eax
  mov   ecx,-1      ; Limit size to 2³²-1 bytes long
  repnz scasb
  jnz   .not_found
  not   ecx
  dec   ecx
  mov   eax,ecx
  ret
.not_found:
  mov   rax,-1      ; Return maximum length if
                    ; NUL char not found.
  ret
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: MediocreVeg1 on January 18, 2021, 08:51:33 AM
Interesting, I'm not exatly sure how you've used not here (Probably way faster than my sub alternative), but I'll try to figure it out. I'll also incorporate your error handlng into my procedure. Thanks for the example!
As for the print statements, I did end up making up making separate puts and putsln procedures. Still not sure what kind of error could arise from them (I guess one could be that if strlen failed, it would return the same error  code).
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: fredericopissarra on January 18, 2021, 02:22:02 PM
Interesting, I'm not exatly sure how you've used not here (Probably way faster than my sub alternative), but I'll try to figure it out.
Not necessarily 'faster', but since 2³²-1-len is the same as ~len, I just used this fact to calculate the string length. (DEC ECX because we're excluding the final NUL char).

I found strange your approach, since sub rdi,rbx (with rbx == 0) do nothing, except affect the flags. Din't you mean to use 'mov rbx,rdi' instead of 'xor rbx,rbx'?

PS: Try to use E?? registers as many times as possible. Using R?? will imply a REX prefix and bigger instructions. moving and doing arithmetic/logical operations to E?? registers will automatically zero the upper 32 bits of R?? registers for you (this doen't work in a few instructions as, for example, CDQ [will zero upper RDX but not upper RAX).

PS: Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
Code: [Select]
size_t size = strlen_( str );
if ( (long)size < 0 ) { ... handle error... }
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: MediocreVeg1 on January 18, 2021, 03:06:55 PM
Quote
I found strange your approach, since sub rdi,rbx (with rbx == 0) do nothing, except affect the flags. Din't you mean to use 'mov rbx,rdi' instead of 'xor rbx,rbx'?
Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly? That's why I cleared it with XOR before using scasb and why I substracted it from rdi (which has starting address of string). I'm probably wrong here though.

Quote
PS: Try to use E?? registers as many times as possible. Using R?? will imply a REX prefix and bigger instructions. moving and doing arithmetic/logical operations to E?? registers will automatically zero the upper 32 bits of R?? registers for you (this doen't work in a few instructions as, for example, CDQ [will zero upper RDX but not upper RAX).
Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?

Quote
S: Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
Yeah, I put it into my procedure as well after you showed your example. Wouldn't this event be highly unlikely though? I think 2^32-1 is like 4294967295 bytes so every single byte after the starting address of the string would have to be non-zero, right?
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: fredericopissarra on January 18, 2021, 06:40:38 PM
Quote
I found strange your approach, since sub rdi,rbx (with rbx == 0) do nothing, except affect the flags. Din't you mean to use 'mov rbx,rdi' instead of 'xor rbx,rbx'?
Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly? That's why I cleared it with XOR before using scasb and why I substracted it from rdi (which has starting address of string). I'm probably wrong here though.
Nope... repnz scasb searches for AL in RDI and forward (increasing RDI and decreasing RCX). It won't change RBX.

Quote
Quote
PS: Try to use E?? registers as many times as possible. Using R?? will imply a REX prefix and bigger instructions. moving and doing arithmetic/logical operations to E?? registers will automatically zero the upper 32 bits of R?? registers for you (this doen't work in a few instructions as, for example, CDQ [will zero upper RDX but not upper RAX).
Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?
It won't get confused because E?? registers are part of R?? registers. When you use a R?? register NASM is forced to add a prefix to the instruction (REX prefix). This prefix isn't added if you use E?? registers. Of course, if the argument is an address (a pointer) you are forced to use R??.

Quote
Quote
S: Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
Yeah, I put it into my procedure as well after you showed your example. Wouldn't this event be highly unlikely though? I think 2^32-1 is like 4294967295 bytes so every single byte after the starting address of the string would have to be non-zero, right?
Yep... The problem when you ser RCX to -1 is that, theoretically you can search a 18446744073709551615 bytes long string. If such string exists (not possible), than REP SCASB would take 192 YEARS to complete (@ 3 GHz). If you restrict the string to 2³²-1 bytes, if such string exists (improbable), it would take only 1.5 seconds to scan.
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: MediocreVeg1 on January 19, 2021, 11:31:18 AM
Quote
Nope... repnz scasb searches for AL in RDI and forward (increasing RDI and decreasing RCX). It won't change RBX.
Oh man, I'm so stupid. How does my procedure even work then? Sheesh, now I'm even confused as to how my own procedure works :P

Quote
It won't get confused because E?? registers are part of R?? registers. When you use a R?? register NASM is forced to add a prefix to the instruction (REX prefix). This prefix isn't added if you use E?? registers. Of course, if the argument is an address (a pointer) you are forced to use R??
Oh, I see. Would that mean that I would be able to enter the call number in eax too? And for example in the second argument of a system call, would I use ebx or edi for 32-bit registers?

Quote
Yep... The problem when you ser RCX to -1 is that, theoretically you can search a 18446744073709551615 bytes long string. If such string exists (not possible), than REP SCASB would take 192 YEARS to complete (@ 3 GHz). If you restrict the string to 2³²-1 bytes, if such string exists (improbable), it would take only 1.5 seconds to scan.
Wow, that's... a lot. Anyway, I think this answers my question. Thanks.
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: fredericopissarra on January 19, 2021, 05:13:06 PM
Quote from: MediocreVeg1
Oh man, I'm so stupid. How does my procedure even work then? Sheesh, now I'm even confused as to how my own procedure works :P
I confesss I cannot see how it works, too... ;)

Quote
Oh, I see. Would that mean that I would be able to enter the call number in eax too? And for example in the second argument of a system call, would I use ebx or edi for 32-bit registers?
A simple example. If you want to print a string you can do:
Code: [Select]
  bits 64
  default rel   ; Use RIP relative addressing.

  section .rodata

msg: db `Hello\n`
msg_len equ $ - msg

  section .text

  global writemsg
writemsg:
  mov eax,1    ; EAX, instead of RAX will zero upper 32 bits.
  mov edi,eax  ; RDI=STDOUT_FILENO (upper 32 bits zeroed).
  lea rsi,[msg] ; Need to load RSI (because it is a pointer). LEA because 'msg' is RIP relative.
  mov edx,msg_len ; RDX (upper 32 bits zeroed).
  syscall
  ret

Those E?? movs are shorter than if you use R?? regs.

[]s
Fred
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: MediocreVeg1 on January 19, 2021, 05:27:09 PM
Quote
I confesss I cannot see how it works, too...  ;)
Yeah, I think I'll change my thing to your not solution so I at least understand what is going on.

And about the registers, I think I finally get it now. Thanks for all the help!
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: munair on July 22, 2023, 09:36:19 AM
Just getting my feed wet with 64 bits (yes, my very first code)  :D. If I understand correctly, there is some optimization to be gained from using 32 bit registers with the string length (which would probably never exceed 4GB).

As usual, no C externals here.  ;D
Code: [Select]
; nasm -f elf64 puts.asm
; ld -m elf_x86_64 puts.o -o puts

bits 64

section .text

    global _start

_start:
    mov     rdi, msg            ; string
    call    strlen              ; length
    test    rax, rax            ; anyhing?
    jz      .__out
    call    puts                ; show it
  .__out:
    mov     rax, 0x3c
    xor     rdi, rdi
    syscall

puts:
    mov     rsi, rdi            ; string
    mov     rdi, 1              ; stdout
    mov     rdx, rax            ; length (from strlen)
    mov     rax, 1              ; write
    syscall
    ret

strlen:
    push    rdi                 ; save address
    sub     rcx, rcx
    not     rcx                 ; rcx -1
    xor     eax, eax
    cld                         ; count forward
    repne   scasb
    not     rcx
    lea     rax, [rcx - 1]      ; length
    pop     rdi
    ret

section .data
    msg db "Hello world!", 10, 0
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: fredericopissarra on July 22, 2023, 11:46:12 AM
Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly? That's why I cleared it with XOR before using scasb and why I substracted it from rdi (which has starting address of string). I'm probably wrong here though.
SCASB reads from ES:RDI and compares with AL, affecting the flags, and updates RDI. With REP (or REPNZ) prefix it does RCX times while ZF=0 (hence the NZ). So strlen could be implemented as:
Code: [Select]
; Same as: size_t strlen( const char * );
; the function assumes ALL strings will be NUL terminated.
strlen_:
  xor eax,eax
  lea ecx,[rax-1]   ; Limiting the string size to 2³²-1, max.
  mov rdx,rdi
  repnz scasb     ; Scan for '\0'...
  sub rdi,rdx
  mov rax,rdi     ; returns size in RAX.
  ret

Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?
E?? registers are the lower part of R?? registers. And, in x86-64 mode, when you change E?? register the upper 32 bits of R?? register is automatically zeroed... Instructions using R?? registers need to insert an prefix (REX prefix), with E?? no prefix...

Wouldn't the assembler
Quote
Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
... Wouldn't this event be highly unlikely though? ...
You are right!... It easier to assume the routine expects ALL strings to be zero terminated...
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: munair on July 22, 2023, 03:38:17 PM
So strlen could be implemented as:
Code: [Select]
; Same as: size_t strlen( const char * );
; the function assumes ALL strings will be NUL terminated.
strlen_:
  xor eax,eax
  lea ecx,[rax-1]   ; Limiting the string size to 2³²-1, max.
  mov rdx,rdi
  repnz scasb     ; Scan for '\0'...
  sub rdi,rdx
  mov rax,rdi     ; returns size in RAX.
  ret

That would return string length + 1. So alternatively:

Code: [Select]
    xor     eax, eax
    lea     ecx, [rax - 1]
    mov     rdx, rdi
    repnz   scasb
    sub     rdi, rdx
    ;mov     rax, rdi
    lea     rax, [rdi - 1]        ; not counting the null terminator
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: fredericopissarra on July 22, 2023, 07:10:09 PM
That would return string length + 1....
Ops... sorry... my bad...
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: alCoPaUL on October 03, 2023, 05:08:47 PM
you can just iterate displaying letters until it's \0..

so you can display a 1 gigabyte or more worth of strings and then cut it when it's \0.
Title: Re: My own 64-bit `puts' instruction (No length required)
Post by: fredericopissarra on October 05, 2023, 03:42:59 PM
Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly?
Nope! scasb tests AL against ES:[RDI], setting the flags. repnz scasb does the same, RCX times while ZF=0.

Quote from: MediocreVeg1
Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?
Nope! E?? registers are the lower 32 bits of R?? registers. When you use a R?? register the instrunction is prefixed with a REX prefix (and, an immediate can be bigger), like, for example:

Code: [Select]
  mov eax,-1    ; B8 FF FF FF FF
  mov rax,-1    ; 48 B8 FF FF FF FF FF FF FF FF

When using EAX the upper 32 bits are automagically (hehe) zeroed.

Quote from: MediocreVeg1
Yeah, I put it into my procedure as well after you showed your example. Wouldn't this event be highly unlikely though? I think 2^32-1 is like 4294967295 bytes so every single byte after the starting address of the string would have to be non-zero, right?
That's why it doesn't make sense using R?? registers to hold string lengths...

[]s
Fred