Recent Posts

Pages: 1 2 [3] 4 5 ... 10
21
Hello,

I just read about the new Intel Advanced Performance Extensions (Intel APX) that adds 16 more general purpose registers, new conditional move instructions, and other new features.  It requires support from the assembler.  16 more GP registers is extremely exciting, as are the other new features.  See https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

GNU Binutils, including gdb and ld, has already been updated to support it.  See https://sourceware.org/pipermail/binutils/2024-January/132213.html and https://www.phoronix.com/news/GNU-Binutils-2.42

Does NASM have plans to support it?   

Thanks very much.
22
The well-known Leibniz-formula to calculate "pi" with many iterations.
Surely, there are faster ways to calculate pi, and there may be the question WHY to calculate pi, but nevertheless...
We can practice here to deal with floating-point-numbers, and we see, how we can print floating-point-numbers (with help of the printf-routine of C.

Quote
; nasm -f elf64 calculatepi1.asm -o calculatepi1.o
; gcc -no-pie -m64 calculatepi1.o -o calculatepi1

; Leibniz says : pi/4 =1/1 - 1/3 + 1/5 - 1/7 + 1/9 - 1/11 + 1/13 .....


section .data
    var_oben  dq 1.0   ; Startwert Zaehler
    var_unten dq 1.0   ; Startwert Teiler
    var_0     dq 0.0   ; Null
    var_2     dq 2.0   ; Zwei
    var_minus1 dq -1.0  ; minus 1
    MSG_RESULT     db "pi = %0.18lf", 10, 0

section .text
   extern printf
    global main

main:
   push rbp
   mov rbp,rsp
    ; Set the number of iterations for the Leibniz formula
    mov rax, 200000

    movsd xmm2, qword [var_0]            ; xmm2 beginnt bei null und nimmt dann die Leibnizzahl als Viertel auf.

    ; Loop to calculate Pi using Leibniz formula
    leibniz_loop:
        ; Calculate next term
        movsd xmm0, qword [var_oben]
        movsd xmm1, qword [var_unten]
        divsd xmm0, xmm1
        addsd xmm2, xmm0               ; xmm2 hat das Viertel des Leibnizwertes!

        ; alternating sign
        movsd xmm3, qword[var_minus1]
        movsd xmm0, qword [var_oben]
        mulsd xmm0, xmm3
        movsd qword [var_oben], xmm0      

        ; Teiler zwei dazu...
        movsd xmm3, qword [var_unten]
        addsd xmm3, qword [var_2]
        movsd qword [var_unten], xmm3


        ; Decrement loop counter
        dec rax
        jnz leibniz_loop

    ; Multiply result by 4
 
   
    movsd xmm3, qword [var_2]
    mulsd xmm2, xmm3
    mulsd xmm2, xmm3
    movsd xmm0, xmm2


   mov      rdi, MSG_RESULT         ; set format for printf
   mov      rax,1         ; set one xmm registers
   call   printf         ; Call C function
   mov   rax,0            ; normal, no error, return value



   pop rbp
   
    mov rax, 60              ; System call number for sys_exit
    xor rdi, rdi             ; Exit code 0
    syscall


This is far from being optimized for speed or whatever.  It is written in a manner to understand easier what is going on.
23
Programming with NASM / Re: Things to do in _start
« Last post by fredericopissarra on March 09, 2024, 11:39:26 PM »
The reason why, no x86-64 mode, the stack usually is aligned to DQWORD (16 bytes) is because SSE. Since SSE/SSE2 is available to every x86-64 capable Intel/AMD microprocessor, floating point operations will use, by default, SSE/SSE2 and movaps instructions requires DQWORD alignment (otherwise you'll get an General Protection Fault).

So, yep, it is useful to keep the stack aligned by DQWORD.
24
Programming with NASM / Re: Things to do in _start
« Last post by decuser on March 09, 2024, 07:56:59 PM »
After thinking it through based on what you've said and seeing a lot of FUD type posts elsewhere about this topic, I have come to the conclusion that:

1. Stack alignment in Linux x64 programming is required by the API (to 16 byte addresses) if you are calling glibc functions (using main, linking with gcc) because the caller address is pushed onto the stack prior to the call and that address is 8 bytes causing the stack to be misaligned by 8 bytes, hence the push rbx (now it's realigned), mov rbx, rsp (save off the original sp), and sub rsp, -16 (realign).

2. It is probably a good idea to align the stack generally, if you plan to access items placed on the stack prior to your program start (such as ARGC and ARGV).

Otherwise, it's not ... required...

Does this sound reasonable?
25
Programming with NASM / Re: Things to do in _start
« Last post by fredericopissarra on March 07, 2024, 04:18:50 PM »
That makes sense. When  you say, main() has a misaligned RSP, how do you know? Is it because of the way nasm puts the binary together? I am doing pure Linux at the moment and using syscalls with _start, not doing the pseudo C, so I gather from what you're saying that I don't need the prolog. But, I will once I switch to main for the pseudo C stuff, so I'm curious how you know  :).

It is described in the ABIs. And you can see for yourself:
Code: [Select]
; test_win64.asm
;
;   c:\work> nasm -fwin64 -o test_win64.o test_win64.asm
;   c:\work> ld -s -o test_win64.exe test_win64.o -lkernel32
;
  bits  64
  default rel

  section .data

buffer:
  db    '0x'
  times 16 db '0'
  db    `\n`
bufferLength equ $ - buffer

  section .text

  extern __imp_GetStdHandle
  extern __imp_WriteConsoleA
  extern __imp_ExitProcess

  global _start
_start:
  ; sub rsp,8       ; align to DQWORD, if needed
 
  mov   rdx,rsp
  lea   rcx,[buffer+2]
  call  u64toStr

  mov   rcx,-11
  call  [__imp_GetStdHandle]
 
  mov   rcx,rax
  lea   rdx,[buffer]
  mov   r8d,bufferLength
  xor   r9,r9
  push  r9
  call  [__imp_WriteConsoleA]

  xor   ecx,ecx
  jmp   [__imp_ExitProcess]
 
; Destroys RAX, RDX and RDI.
  align 4
u64toStr:
  lea   rdi,[rcx+15]
  jmp   .test
.loop:
  mov   rax,rdx
  and   al,0x0f
  add   al,'0'
  cmp   al,'9'
  jbe   .skip
  add   al,7
.skip:
  mov   [rdi],al
  shr   rdx,4
  dec   rdi
.test:
  cmp   rdi,rcx
  jae   .loop
  ret
  bits  64
  default rel

  section .data

buffer:
  db    '0x'
  times 16 db '0'
  db    `\n`
bufferLength equ $ - buffer

  section .text

  extern __imp_GetStdHandle
  extern __imp_WriteConsoleA
  extern __imp_ExitProcess

  global _start
_start:
  ; sub rsp,8       ; align to DQWORD, if needed
 
  mov   rdx,rsp
  lea   rcx,[buffer+2]
  call  u64toStr

  mov   rcx,-11
  call  [__imp_GetStdHandle]
 
  mov   rcx,rax
  lea   rdx,[buffer]
  mov   r8d,bufferLength
  xor   r9,r9
  push  r9
  call  [__imp_WriteConsoleA]

  xor   ecx,ecx
  jmp   [__imp_ExitProcess]
 
; Destroys RAX, RDX and RDI.
  align 4
u64toStr:
  lea   rdi,[rcx+15]
  jmp   .test
.loop:
  mov   rax,rdx
  and   al,0x0f
  add   al,'0'
  cmp   al,'9'
  jbe   .skip
  add   al,7
.skip:
  mov   [rdi],al
  shr   rdx,4
  dec   rdi
.test:
  cmp   rdi,rcx
  jae   .loop
  ret
Code: [Select]
; test_sysv.asm
;
;    $ nasm -felf64 -o test_sysv.o test_sysv.asm
;    $ ld -s -o test_sysv test_sysv.o
;
  bits  64
  default rel

  section .data

buffer:
  db    '0x'
  times 16 db '0'
  db    `\n`
bufferLength equ $ - buffer

  section .text

  global _start
_start:
  ; NOTA: RSP já está alinhado por DQWORD (SysV ABI)!

  mov   rsi,rsp
  lea   rdi,[buffer+2]
  call  u64toStr

  mov   eax,1
  mov   edi,eax
  lea   rsi,[buffer]
  mov   edx,bufferLength
  syscall

  xor   edi,edi
  mov   eax,60
  syscall
 
  align 4
u64toStr:
  lea   rcx,[rdi+15]
  jmp   .test
.loop:
  mov   rax,rsi
  and   al,0x0f
  add   al,'0'
  cmp   al,'9'
  jbe   .skip
  add   al,7
.skip:
  mov   [rcx],al
  shr   rsi,4
  dec   rcx
.test:
  cmp   rcx,rdi
  jae   .loop
  ret
Running both of them (using MinGW64 for Windows):
Code: [Select]
c:\work> nasm -fwin64 -o test_win64.o test_win64.asm
c:\work> ld -s -o test_win64.exe test_win64.o -lkernel32
c:\work> test_win64
0x00000034601FF858
Code: [Select]
$ nasm -felf64 -o test_sysv.o test_sysv.asm
$ ld -s -o test_sysv test_sysv.o
$ ./test_sysv
0x00007FFD2BBF2D90
Notice the first 4 bits...

Anyway, see the SysV ABI for x86-64, here, topic 3.4.1 (in Stack State):

Quote
%rsp  "The stack pointer holds the address of the byte with lowest address which is part of
the stack. It is guaranteed to be 16-byte aligned at process entry."

On SysV ABI RSP points to argc, argv, envp, main-like arguments (see the ABI).

For MS-ABI you'll have to search the MSDN.
26
Programming with NASM / Re: Things to do in _start
« Last post by decuser on March 07, 2024, 01:20:13 PM »
That makes sense. When  you say, main() has a misaligned RSP, how do you know? Is it because of the way nasm puts the binary together? I am doing pure Linux at the moment and using syscalls with _start, not doing the pseudo C, so I gather from what you're saying that I don't need the prolog. But, I will once I switch to main for the pseudo C stuff, so I'm curious how you know  :).
27
Programming with NASM / Re: Things to do in _start
« Last post by fredericopissarra on March 07, 2024, 12:43:30 PM »
There is no reason for this stuff... Well... not this way and not always.

If you are creating what I call a pseudo-assembly code (creating a C program, using C Runtime and libc in asm), then you must obey the ABI (MS-ABI or SysV ABI). This means RSP must be aligned by DQWORD (16 bytes). main() has a misaligned RSP, so you must do:
Code: [Select]
  global main
main:
  sub rsp,8    ; align RSP
  ...
  add rsp,8    ; restore RSP before returning
  xor  eax,eax  ; return 0;
  ret
Of course, using libc in _start isn't a good idea (you'll need to initialize the library, the C Runtime).

In _start, on Windows, if you are using Win32 API, you must align RSP and reserve space to shadow space:
Code: [Select]
_start:
  sub  rsp,8+32   ; align RSP and reserve space for shadow space.
  ...
  ; Don't need to restore RSP here...
  xor  ecx,ecx
  jmp  [__imp_ExitProcess]

On SysV ABI (Linux, etc) it is garanteed that RSP will be aligned to DQWORD on _start entry. On MS-ABI it isn't!
[]s
Fred
28
Programming with NASM / Things to do in _start
« Last post by decuser on March 07, 2024, 04:59:02 AM »
I see a lot of example code out there that has _start like this:

Code: [Select]
push rbp
mov rbp, rsp
and rsp, -16

and:
Code: [Select]
push rsp
mov rbp, rsp
nop

and even just:
Code: [Select]
nop
What's going on? Is there some reason for this stuff? It's not part of the main logic, it seems like it's some kind of setup, but I can't make sense of it. I have heard about stack alignment, maybe this is something to do with that, if so WTF? What do I need to have at the beginning of my code?
29
Programming with NASM / Re: Learning Assembler
« Last post by AntonPotapov on March 03, 2024, 04:15:25 PM »
Thank you
30
Programming with NASM / Re: gdb and debug symbols
« Last post by decuser on March 03, 2024, 03:56:36 PM »
You and me, both! Screen’s aren’t big enough. Somebody oughtta...
Pages: 1 2 [3] 4 5 ... 10