Author Topic: My own 64-bit `puts' instruction (No length required) (Read 274832 times)

fredericopissarra · « **Reply #15 on:** January 18, 2021, 06:40:38 PM »

Quote from: MediocreVeg1 on January 18, 2021, 03:06:55 PM

Quote
I found strange your approach, since sub rdi,rbx (with rbx == 0) do nothing, except affect the flags. Din't you mean to use 'mov rbx,rdi' instead of 'xor rbx,rbx'?
Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly? That's why I cleared it with XOR before using scasb and why I substracted it from rdi (which has starting address of string). I'm probably wrong here though.

Nope... repnz scasb searches for AL in RDI and forward (increasing RDI and decreasing RCX). It won't change RBX.

Quote

Quote
PS: Try to use E?? registers as many times as possible. Using R?? will imply a REX prefix and bigger instructions. moving and doing arithmetic/logical operations to E?? registers will automatically zero the upper 32 bits of R?? registers for you (this doen't work in a few instructions as, for example, CDQ [will zero upper RDX but not upper RAX).
Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?

It won't get confused because E?? registers are part of R?? registers. When you use a R?? register NASM is forced to add a prefix to the instruction (REX prefix). This prefix isn't added if you use E?? registers. Of course, if the argument is an address (a pointer) you are forced to use R??.

Quote

Quote
S: Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
Yeah, I put it into my procedure as well after you showed your example. Wouldn't this event be highly unlikely though? I think 2^32-1 is like 4294967295 bytes so every single byte after the starting address of the string would have to be non-zero, right?

Yep... The problem when you ser RCX to -1 is that, theoretically you can search a 18446744073709551615 bytes long string. If such string exists (not possible), than REP SCASB would take 192 YEARS to complete (@ 3 GHz). If you restrict the string to 2³²-1 bytes, if such string exists (improbable), it would take only 1.5 seconds to scan.

MediocreVeg1 · « **Reply #16 on:** January 19, 2021, 11:31:18 AM »

Quote

Nope... repnz scasb searches for AL in RDI and forward (increasing RDI and decreasing RCX). It won't change RBX.

Oh man, I'm so stupid. How does my procedure even work then? Sheesh, now I'm even confused as to how my own procedure works

Quote

It won't get confused because E?? registers are part of R?? registers. When you use a R?? register NASM is forced to add a prefix to the instruction (REX prefix). This prefix isn't added if you use E?? registers. Of course, if the argument is an address (a pointer) you are forced to use R??

Oh, I see. Would that mean that I would be able to enter the call number in eax too? And for example in the second argument of a system call, would I use ebx or edi for 32-bit registers?

Quote

Yep... The problem when you ser RCX to -1 is that, theoretically you can search a 18446744073709551615 bytes long string. If such string exists (not possible), than REP SCASB would take 192 YEARS to complete (@ 3 GHz). If you restrict the string to 2³²-1 bytes, if such string exists (improbable), it would take only 1.5 seconds to scan.

Wow, that's... a lot. Anyway, I think this answers my question. Thanks.

fredericopissarra · « **Reply #17 on:** January 19, 2021, 05:13:06 PM »

Quote from: MediocreVeg1

Oh man, I'm so stupid. How does my procedure even work then? Sheesh, now I'm even confused as to how my own procedure works

I confesss I cannot see how it works, too...

Quote

Oh, I see. Would that mean that I would be able to enter the call number in eax too? And for example in the second argument of a system call, would I use ebx or edi for 32-bit registers?

A simple example. If you want to print a string you can do:

Code: [Select]

  bits 64
  default rel   ; Use RIP relative addressing.

  section .rodata

msg: db `Hello\n`
msg_len equ $ - msg

  section .text

  global writemsg
writemsg:
  mov eax,1    ; EAX, instead of RAX will zero upper 32 bits.
  mov edi,eax  ; RDI=STDOUT_FILENO (upper 32 bits zeroed).
  lea rsi,[msg] ; Need to load RSI (because it is a pointer). LEA because 'msg' is RIP relative.
  mov edx,msg_len ; RDX (upper 32 bits zeroed).
  syscall
  ret

Those E?? movs are shorter than if you use R?? regs.

[]s
Fred

MediocreVeg1 · « **Reply #18 on:** January 19, 2021, 05:27:09 PM »

Quote

I confesss I cannot see how it works, too...

Yeah, I think I'll change my thing to your not solution so I at least understand what is going on.

And about the registers, I think I finally get it now. Thanks for all the help!

munair · « **Reply #19 on:** July 22, 2023, 09:36:19 AM »

Just getting my feed wet with 64 bits (yes, my very first code)

. If I understand correctly, there is some optimization to be gained from using 32 bit registers with the string length (which would probably never exceed 4GB).

As usual, no C externals here.

Code: [Select]

; nasm -f elf64 puts.asm
; ld -m elf_x86_64 puts.o -o puts

bits 64

section .text

    global _start

_start:
    mov     rdi, msg            ; string
    call    strlen              ; length
    test    rax, rax            ; anyhing?
    jz      .__out
    call    puts                ; show it
  .__out:
    mov     rax, 0x3c
    xor     rdi, rdi
    syscall

puts:
    mov     rsi, rdi            ; string
    mov     rdi, 1              ; stdout
    mov     rdx, rax            ; length (from strlen)
    mov     rax, 1              ; write
    syscall
    ret

strlen:
    push    rdi                 ; save address
    sub     rcx, rcx
    not     rcx                 ; rcx -1
    xor     eax, eax
    cld                         ; count forward
    repne   scasb
    not     rcx
    lea     rax, [rcx - 1]      ; length
    pop     rdi
    ret

section .data
    msg db "Hello world!", 10, 0

fredericopissarra · « **Reply #20 on:** July 22, 2023, 11:46:12 AM »

Quote from: MediocreVeg1 on January 18, 2021, 03:06:55 PM

Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly? That's why I cleared it with XOR before using scasb and why I substracted it from rdi (which has starting address of string). I'm probably wrong here though.

SCASB reads from ES:RDI and compares with AL, affecting the flags, and updates RDI. With REP (or REPNZ) prefix it does RCX times while ZF=0 (hence the NZ). So strlen could be implemented as:

Code: [Select]

; Same as: size_t strlen( const char * );
; the function assumes ALL strings will be NUL terminated.
strlen_:
  xor eax,eax
  lea ecx,[rax-1]   ; Limiting the string size to 2³²-1, max.
  mov rdx,rdi
  repnz scasb     ; Scan for '\0'...
  sub rdi,rdx
  mov rax,rdi     ; returns size in RAX.
  ret

Quote from: MediocreVeg1 on January 18, 2021, 03:06:55 PM

Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?

E?? registers are the lower part of R?? registers. And, in x86-64 mode, when you change E?? register the upper 32 bits of R?? register is automatically zeroed... Instructions using R?? registers need to insert an prefix (REX prefix), with E?? no prefix...

Quote from: MediocreVeg1 on January 18, 2021, 03:06:55 PM

Wouldn't the assembler
Quote
Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
... Wouldn't this event be highly unlikely though? ...

You are right!... It easier to assume the routine expects ALL strings to be zero terminated...

munair · « **Reply #21 on:** July 22, 2023, 03:38:17 PM »

Quote from: fredericopissarra on July 22, 2023, 11:46:12 AM

So strlen could be implemented as:
Code: [Select]
; Same as: size_t strlen( const char * ); ; the function assumes ALL strings will be NUL terminated. strlen_: xor eax,eax lea ecx,[rax-1] ; Limiting the string size to 2³²-1, max. mov rdx,rdi repnz scasb ; Scan for '\0'... sub rdi,rdx mov rax,rdi ; returns size in RAX. ret

That would return string length + 1. So alternatively:

Code: [Select]

    xor     eax, eax
    lea     ecx, [rax - 1]
    mov     rdx, rdi
    repnz   scasb
    sub     rdi, rdx
    ;mov     rax, rdi
    lea     rax, [rdi - 1]        ; not counting the null terminator

fredericopissarra · « **Reply #22 on:** July 22, 2023, 07:10:09 PM »

Quote from: munair on July 22, 2023, 03:38:17 PM

That would return string length + 1....

Ops... sorry... my bad...

alCoPaUL · « **Reply #23 on:** October 03, 2023, 05:08:47 PM »

you can just iterate displaying letters until it's \0..

so you can display a 1 gigabyte or more worth of strings and then cut it when it's \0.

fredericopissarra · « **Reply #24 on:** October 05, 2023, 03:42:59 PM »

Quote from: MediocreVeg1 on January 18, 2021, 03:06:55 PM

Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly?

Nope! scasb tests AL against ES:[RDI], setting the flags. repnz scasb does the same, RCX times while ZF=0.

Quote from: MediocreVeg1

Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?

Nope! E?? registers are the lower 32 bits of R?? registers. When you use a R?? register the instrunction is prefixed with a REX prefix (and, an immediate can be bigger), like, for example:

Code: [Select]

  mov eax,-1    ; B8 FF FF FF FF
  mov rax,-1    ; 48 B8 FF FF FF FF FF FF FF FF

When using EAX the upper 32 bits are automagically (hehe) zeroed.

Quote from: MediocreVeg1

Yeah, I put it into my procedure as well after you showed your example. Wouldn't this event be highly unlikely though? I think 2^32-1 is like 4294967295 bytes so every single byte after the starting address of the string would have to be non-zero, right?

That's why it doesn't make sense using R?? registers to hold string lengths...

[]s
Fred

NASM - The Netwide Assembler

News:

Author Topic: My own 64-bit `puts' instruction (No length required) (Read 274832 times)

fredericopissarra

Re: My own 64-bit `puts' instruction (No length required)

MediocreVeg1

Re: My own 64-bit `puts' instruction (No length required)

fredericopissarra

Re: My own 64-bit `puts' instruction (No length required)

MediocreVeg1

Re: My own 64-bit `puts' instruction (No length required)

munair

Re: My own 64-bit `puts' instruction (No length required)

fredericopissarra

Re: My own 64-bit `puts' instruction (No length required)

munair

Re: My own 64-bit `puts' instruction (No length required)

fredericopissarra

Re: My own 64-bit `puts' instruction (No length required)

alCoPaUL

Re: My own 64-bit `puts' instruction (No length required)

fredericopissarra

Re: My own 64-bit `puts' instruction (No length required)