I found strange your approach, since sub rdi,rbx (with rbx == 0) do nothing, except affect the flags. Din't you mean to use 'mov rbx,rdi' instead of 'xor rbx,rbx'?
Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly? That's why I cleared it with XOR before using scasb and why I substracted it from rdi (which has starting address of string). I'm probably wrong here though.
Nope... repnz scasb searches for AL in RDI and forward (increasing RDI and decreasing RCX). It won't change RBX.
PS: Try to use E?? registers as many times as possible. Using R?? will imply a REX prefix and bigger instructions. moving and doing arithmetic/logical operations to E?? registers will automatically zero the upper 32 bits of R?? registers for you (this doen't work in a few instructions as, for example, CDQ [will zero upper RDX but not upper RAX).
Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?
It won't get confused because E?? registers are part of R?? registers. When you use a R?? register NASM is forced to add a prefix to the instruction (REX prefix). This prefix isn't added if you use E?? registers. Of course, if the argument is an address (a pointer) you are forced to use R??.
S: Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
Yeah, I put it into my procedure as well after you showed your example. Wouldn't this event be highly unlikely though? I think 2^32-1 is like 4294967295 bytes so every single byte after the starting address of the string would have to be non-zero, right?
Yep... The problem when you ser RCX to -1 is that, theoretically you can search a 18446744073709551615 bytes long string. If such string exists (not possible), than REP SCASB would take 192 YEARS to complete (@ 3 GHz). If you restrict the string to 2³²-1 bytes, if such string exists (improbable), it would take only 1.5 seconds to scan.