Recent Posts

Pages: 1 ... 8 9 [10]
91
That would return string length + 1....
Ops... sorry... my bad...
92
Example Code / Re: My own 64-bit `puts' instruction (No length required)
« Last post by munair on July 22, 2023, 03:38:17 PM »
So strlen could be implemented as:
Code: [Select]
; Same as: size_t strlen( const char * );
; the function assumes ALL strings will be NUL terminated.
strlen_:
  xor eax,eax
  lea ecx,[rax-1]   ; Limiting the string size to 2³²-1, max.
  mov rdx,rdi
  repnz scasb     ; Scan for '\0'...
  sub rdi,rdx
  mov rax,rdi     ; returns size in RAX.
  ret

That would return string length + 1. So alternatively:

Code: [Select]
    xor     eax, eax
    lea     ecx, [rax - 1]
    mov     rdx, rdi
    repnz   scasb
    sub     rdi, rdx
    ;mov     rax, rdi
    lea     rax, [rdi - 1]        ; not counting the null terminator
93
Maybe I'm not understanding how scas works, but isn't the result of scasb stored in rbx in 64-bit assembly? That's why I cleared it with XOR before using scasb and why I substracted it from rdi (which has starting address of string). I'm probably wrong here though.
SCASB reads from ES:RDI and compares with AL, affecting the flags, and updates RDI. With REP (or REPNZ) prefix it does RCX times while ZF=0 (hence the NZ). So strlen could be implemented as:
Code: [Select]
; Same as: size_t strlen( const char * );
; the function assumes ALL strings will be NUL terminated.
strlen_:
  xor eax,eax
  lea ecx,[rax-1]   ; Limiting the string size to 2³²-1, max.
  mov rdx,rdi
  repnz scasb     ; Scan for '\0'...
  sub rdi,rdx
  mov rax,rdi     ; returns size in RAX.
  ret

Wouldn't the assembler get confused if I used 64-bit syscalls on 32-bit registers? Or if I put some arguments of a syscall in R?? registers and others in E?? registers?
E?? registers are the lower part of R?? registers. And, in x86-64 mode, when you change E?? register the upper 32 bits of R?? register is automatically zeroed... Instructions using R?? registers need to insert an prefix (REX prefix), with E?? no prefix...

Wouldn't the assembler
Quote
Notice in my routine, if '\0' isn't found in a block of 2³²-1 bytes it returns -1 (all bits set) in RAX. This allows you to test an error:
... Wouldn't this event be highly unlikely though? ...
You are right!... It easier to assume the routine expects ALL strings to be zero terminated...
94
Example Code / Re: My own 64-bit `puts' instruction (No length required)
« Last post by munair on July 22, 2023, 09:36:19 AM »
Just getting my feed wet with 64 bits (yes, my very first code)  :D. If I understand correctly, there is some optimization to be gained from using 32 bit registers with the string length (which would probably never exceed 4GB).

As usual, no C externals here.  ;D
Code: [Select]
; nasm -f elf64 puts.asm
; ld -m elf_x86_64 puts.o -o puts

bits 64

section .text

    global _start

_start:
    mov     rdi, msg            ; string
    call    strlen              ; length
    test    rax, rax            ; anyhing?
    jz      .__out
    call    puts                ; show it
  .__out:
    mov     rax, 0x3c
    xor     rdi, rdi
    syscall

puts:
    mov     rsi, rdi            ; string
    mov     rdi, 1              ; stdout
    mov     rdx, rax            ; length (from strlen)
    mov     rax, 1              ; write
    syscall
    ret

strlen:
    push    rdi                 ; save address
    sub     rcx, rcx
    not     rcx                 ; rcx -1
    xor     eax, eax
    cld                         ; count forward
    repne   scasb
    not     rcx
    lea     rax, [rcx - 1]      ; length
    pop     rdi
    ret

section .data
    msg db "Hello world!", 10, 0
95
Programming with NASM / Re: About obsolete practices...
« Last post by munair on July 21, 2023, 02:16:17 PM »
Without optimizations, compilers simply output the logical translation of the source code step by step, even if it means unnecessarily reading back from the stack. Compilers are among the most complex pieces of software; each process has to be logical and clear. Therefore, optimization is by necessity a separate step.

In its current state, whereby optimization has not been implemented yet, the SharpBASIC compiler is doing even worse if we take your example:

Code: (SB) [Select]
func f(x:int):int
do
  f = x + 1;
end

which is translated to:

Code: [Select]
_I107:
push    ebp
mov     ebp, esp
sub     esp, 4
mov     dword [ebp - 4], 0
mov     eax, dword [ebp + 8]
push    eax
mov     eax, 1
pop     edx
add     eax, edx
mov     [ebp - 4], eax
._L0:
mov     eax, dword [ebp - 4]
mov     esp, ebp
pop     ebp
ret

The expression parser doesn't care much what is being computed; it simply follows an initial standard logic by pushing and popping lhs and rhs operands. Obviously, there is a LOT of optimization left to do. But the whole process, from translation to executable IMO is just magical if you think about it.
96
Programming with NASM / Re: About obsolete practices...
« Last post by fredericopissarra on July 21, 2023, 12:15:42 PM »
In normal language prologue is a part that comes "before", while epilogue is a part that comes "after".
This is the dictionary definition of the words...

Code: (masm) [Select]
.LC0:
        .string "%u\n"
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-12], 3
        mov     DWORD PTR [rbp-16], 1
        mov     eax, DWORD PTR [rbp-12]
        xor     eax, DWORD PTR [rbp-16]
        mov     DWORD PTR [rbp-4], eax
        mov     eax, DWORD PTR [rbp-12]
        and     eax, DWORD PTR [rbp-16]
        mov     DWORD PTR [rbp-8], eax
        jmp     .L2
.L3:
        sal     DWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rbp-4]
        mov     DWORD PTR [rbp-12], eax
        mov     eax, DWORD PTR [rbp-8]
        mov     DWORD PTR [rbp-16], eax
        mov     eax, DWORD PTR [rbp-12]
        xor     eax, DWORD PTR [rbp-16]
        mov     DWORD PTR [rbp-4], eax
        mov     eax, DWORD PTR [rbp-12]
        and     eax, DWORD PTR [rbp-16]
        mov     DWORD PTR [rbp-8], eax
.L2:
        cmp     DWORD PTR [rbp-8], 0
        jne     .L3
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, 0
        leave
        ret
Why the prologue/epilogue since, in x86-64 mode, all arguments are passed through registers? Here, without optimizations, the compiler chooses to use the stack to hold the local objects (unecessary as well, since there are sufficient registers to hold those objects). Notice that not a minimum of optimization is done (mov eax,0 is bigger then xor eax,eax and not macro-fused).

LEAVE has a throughput of 4 cycles. while POP RBP, 3 (that's WHY the compiler don't use ENTER instruction: 8 cycles, against PUSH RBP with only 3).

Without optimizations the compiler will always create inefficient code. Here's an example:
Code: [Select]
; int f( int x ) { return x + 1; }

; -O2 -fomit-frame-pointer    ; No optimizations
f:                            f:
  lea eax, [rdi+1]              push  rbp
  ret                           mov   rbp, rsp

                                mov   [rbp-4], edi  ; write argument on the stack.
                                mov   eax, [rbp-4]  ; read back from stack (why?!)
                                add   eax, 1

                                pop rbp
                                ret

The not optimized code is the worse code possible: 4 accesses to memory (2 potential cache misses) and no instrcutions can be paired (each one depends on the previous). Not considering call/ret, the optimized version runs in 3 cycles, the unoptimized, in 12 (at least).

In i386 mode, using cdecl convention, the compiler uses the stack, but even then, the prologue/epilogue aren't necessary, getting rid of 3 instructions and 7 cycles, but unoptimized code will add other artifacts (specially if you are using PIE executables):
Code: [Select]
; optimized             ; not optimized
f:                      f:
  mov eax, [esp+4]        push  ebp
  add eax, 1              mov ebp, esp
  ret
                          call  __x86.get_pc_thunk.ax
                          add eax, _GLOBAL_OFFSET_TABLE_
                          mov eax, [ebp+8]
                          add eax, 1

                          pop ebp
                          ret

                        __x86.get_pc_thunk.ax:
                          mov eax, [esp]
                          ret

Again: This is an OLD practice and should be avoided. Specially in assembly. The ONLY purpose for prologues/epilogues is the allow access to the stack in old 8086/80186/80286 processors. Since 386 this is unecessary. Without it RBP is free to use as "general purpose", instead of base pointer to stack.
97
Programming with NASM / Re: About obsolete practices...
« Last post by munair on July 21, 2023, 06:21:55 AM »
In normal language prologue is a part that comes "before", while epilogue is a part that comes "after". With stack frames this is the same:

Quote
In assembly language programming, the function prologue is a few lines of code at the beginning of a function, which prepare the stack and registers for use within the function. Similarly, the function epilogue appears at the end of the function,
Source: wikipedia

That said, without optimization switches, GCC still generates old-fashioned stack frames. Have a look at the following example:
Code: (C) [Select]
int main( void )
{
    unsigned int x = 3, y = 1, sum, carry;
    sum = x ^ y; // x XOR y
    carry = x & y; // x AND y
    while (carry != 0)
    {
        carry = carry << 1; // left shift the carry
        x = sum; // initialize x as sum
        y = carry; // initialize y as carry
        sum = x ^ y; // sum is calculated
        carry = x & y; /* carry is calculated, the loop condition is
                          evaluated and the process is repeated until
                          carry is equal to 0.
                        */
    }
    printf("%u\n", sum); // the program will print 4
    return 0;
}

On compiler explorer GCC 13.1 generates the following masm code:
Code: (masm) [Select]
.LC0:
        .string "%u\n"
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-12], 3
        mov     DWORD PTR [rbp-16], 1
        mov     eax, DWORD PTR [rbp-12]
        xor     eax, DWORD PTR [rbp-16]
        mov     DWORD PTR [rbp-4], eax
        mov     eax, DWORD PTR [rbp-12]
        and     eax, DWORD PTR [rbp-16]
        mov     DWORD PTR [rbp-8], eax
        jmp     .L2
.L3:
        sal     DWORD PTR [rbp-8]
        mov     eax, DWORD PTR [rbp-4]
        mov     DWORD PTR [rbp-12], eax
        mov     eax, DWORD PTR [rbp-8]
        mov     DWORD PTR [rbp-16], eax
        mov     eax, DWORD PTR [rbp-12]
        xor     eax, DWORD PTR [rbp-16]
        mov     DWORD PTR [rbp-4], eax
        mov     eax, DWORD PTR [rbp-12]
        and     eax, DWORD PTR [rbp-16]
        mov     DWORD PTR [rbp-8], eax
.L2:
        cmp     DWORD PTR [rbp-8], 0
        jne     .L3
        mov     eax, DWORD PTR [rbp-4]
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, 0
        leave
        ret

Perhaps one of the reasons is that prologues and epilogues can contain code for buffer overflow protection.
98
Example Code / Re: Assembly version of perror() maybe
« Last post by Frank Kotler on July 20, 2023, 08:11:46 PM »
Hi Fred,

Let me repeat  what I said of my code in the  first place:
"it is what it is"

Certainly it is for Linux only. Assembly is not portable.. That's one you and I agree on! Perhaps that's a good place to stop. :)

Best,
Frank

99
Example Code / Re: Assembly version of perror() maybe
« Last post by munair on July 20, 2023, 05:58:23 PM »
There's a small confusion here. The first Mac OS (OS/2 and Windows too, by the way) used pascal calling convention. They weren't encoded in pascal per se.

I must correct you there. The original operating system for the Macintosh in the early 1980s was first written in Pascal (I have seen the source code). It was only after system resources became limited that part of the code was ported to 68K assembly. You can read about it here.
100
Example Code / Re: Assembly version of perror() maybe
« Last post by fredericopissarra on July 20, 2023, 05:21:12 PM »
There can be plenty of reasons not to like C. Most importantly, it can be hard to debug.
Surely isn't harder than debugging assembly programs!

Let's not forget that the first Mac OS in the 1980s was coded in Pascal, before it was surpassed by C as popular language -- largely thanks to UNIX.
There's a small confusion here. The first Mac OS (OS/2 and Windows too, by the way) used pascal calling convention. They weren't encoded in pascal per se. Until now Microsoft keep something like pascal calling convention on Windows (stdcall is like pascal, except on argument ordering -- which follows C's right to left pushing [in i386 mode, for example] -- the stack is cleaned by the called function, not the caller).

One of few softwares I know was entirely developed in Pascal was the first version of Photoshop.
Pages: 1 ... 8 9 [10]