NASM - The Netwide Assembler
NASM Forum => Programming with NASM => Topic started by: decuser on March 07, 2024, 04:59:02 AM
-
I see a lot of example code out there that has _start like this:
push rbp
mov rbp, rsp
and rsp, -16
and:
push rsp
mov rbp, rsp
nop
and even just:
nop
What's going on? Is there some reason for this stuff? It's not part of the main logic, it seems like it's some kind of setup, but I can't make sense of it. I have heard about stack alignment, maybe this is something to do with that, if so WTF? What do I need to have at the beginning of my code?
-
There is no reason for this stuff... Well... not this way and not always.
If you are creating what I call a pseudo-assembly code (creating a C program, using C Runtime and libc in asm), then you must obey the ABI (MS-ABI or SysV ABI). This means RSP must be aligned by DQWORD (16 bytes). main() has a misaligned RSP, so you must do:
global main
main:
sub rsp,8 ; align RSP
...
add rsp,8 ; restore RSP before returning
xor eax,eax ; return 0;
ret
Of course, using libc in _start isn't a good idea (you'll need to initialize the library, the C Runtime).
In _start, on Windows, if you are using Win32 API, you must align RSP and reserve space to shadow space:
_start:
sub rsp,8+32 ; align RSP and reserve space for shadow space.
...
; Don't need to restore RSP here...
xor ecx,ecx
jmp [__imp_ExitProcess]
On SysV ABI (Linux, etc) it is garanteed that RSP will be aligned to DQWORD on _start entry. On MS-ABI it isn't!
[]s
Fred
-
That makes sense. When you say, main() has a misaligned RSP, how do you know? Is it because of the way nasm puts the binary together? I am doing pure Linux at the moment and using syscalls with _start, not doing the pseudo C, so I gather from what you're saying that I don't need the prolog. But, I will once I switch to main for the pseudo C stuff, so I'm curious how you know :).
-
That makes sense. When you say, main() has a misaligned RSP, how do you know? Is it because of the way nasm puts the binary together? I am doing pure Linux at the moment and using syscalls with _start, not doing the pseudo C, so I gather from what you're saying that I don't need the prolog. But, I will once I switch to main for the pseudo C stuff, so I'm curious how you know :).
It is described in the ABIs. And you can see for yourself:
; test_win64.asm
;
; c:\work> nasm -fwin64 -o test_win64.o test_win64.asm
; c:\work> ld -s -o test_win64.exe test_win64.o -lkernel32
;
bits 64
default rel
section .data
buffer:
db '0x'
times 16 db '0'
db `\n`
bufferLength equ $ - buffer
section .text
extern __imp_GetStdHandle
extern __imp_WriteConsoleA
extern __imp_ExitProcess
global _start
_start:
; sub rsp,8 ; align to DQWORD, if needed
mov rdx,rsp
lea rcx,[buffer+2]
call u64toStr
mov rcx,-11
call [__imp_GetStdHandle]
mov rcx,rax
lea rdx,[buffer]
mov r8d,bufferLength
xor r9,r9
push r9
call [__imp_WriteConsoleA]
xor ecx,ecx
jmp [__imp_ExitProcess]
; Destroys RAX, RDX and RDI.
align 4
u64toStr:
lea rdi,[rcx+15]
jmp .test
.loop:
mov rax,rdx
and al,0x0f
add al,'0'
cmp al,'9'
jbe .skip
add al,7
.skip:
mov [rdi],al
shr rdx,4
dec rdi
.test:
cmp rdi,rcx
jae .loop
ret
bits 64
default rel
section .data
buffer:
db '0x'
times 16 db '0'
db `\n`
bufferLength equ $ - buffer
section .text
extern __imp_GetStdHandle
extern __imp_WriteConsoleA
extern __imp_ExitProcess
global _start
_start:
; sub rsp,8 ; align to DQWORD, if needed
mov rdx,rsp
lea rcx,[buffer+2]
call u64toStr
mov rcx,-11
call [__imp_GetStdHandle]
mov rcx,rax
lea rdx,[buffer]
mov r8d,bufferLength
xor r9,r9
push r9
call [__imp_WriteConsoleA]
xor ecx,ecx
jmp [__imp_ExitProcess]
; Destroys RAX, RDX and RDI.
align 4
u64toStr:
lea rdi,[rcx+15]
jmp .test
.loop:
mov rax,rdx
and al,0x0f
add al,'0'
cmp al,'9'
jbe .skip
add al,7
.skip:
mov [rdi],al
shr rdx,4
dec rdi
.test:
cmp rdi,rcx
jae .loop
ret
; test_sysv.asm
;
; $ nasm -felf64 -o test_sysv.o test_sysv.asm
; $ ld -s -o test_sysv test_sysv.o
;
bits 64
default rel
section .data
buffer:
db '0x'
times 16 db '0'
db `\n`
bufferLength equ $ - buffer
section .text
global _start
_start:
; NOTA: RSP já está alinhado por DQWORD (SysV ABI)!
mov rsi,rsp
lea rdi,[buffer+2]
call u64toStr
mov eax,1
mov edi,eax
lea rsi,[buffer]
mov edx,bufferLength
syscall
xor edi,edi
mov eax,60
syscall
align 4
u64toStr:
lea rcx,[rdi+15]
jmp .test
.loop:
mov rax,rsi
and al,0x0f
add al,'0'
cmp al,'9'
jbe .skip
add al,7
.skip:
mov [rcx],al
shr rsi,4
dec rcx
.test:
cmp rcx,rdi
jae .loop
ret
Running both of them (using MinGW64 for Windows):
c:\work> nasm -fwin64 -o test_win64.o test_win64.asm
c:\work> ld -s -o test_win64.exe test_win64.o -lkernel32
c:\work> test_win64
0x00000034601FF858
$ nasm -felf64 -o test_sysv.o test_sysv.asm
$ ld -s -o test_sysv test_sysv.o
$ ./test_sysv
0x00007FFD2BBF2D90
Notice the first 4 bits...
Anyway, see the SysV ABI for x86-64, here (https://cs61.seas.harvard.edu/site/pdf/x86-64-abi-20210928.pdf), topic 3.4.1 (in Stack State):
%rsp "The stack pointer holds the address of the byte with lowest address which is part of
the stack. It is guaranteed to be 16-byte aligned at process entry."
On SysV ABI RSP points to argc, argv, envp, main-like arguments (see the ABI).
For MS-ABI you'll have to search the MSDN.
-
After thinking it through based on what you've said and seeing a lot of FUD type posts elsewhere about this topic, I have come to the conclusion that:
1. Stack alignment in Linux x64 programming is required by the API (to 16 byte addresses) if you are calling glibc functions (using main, linking with gcc) because the caller address is pushed onto the stack prior to the call and that address is 8 bytes causing the stack to be misaligned by 8 bytes, hence the push rbx (now it's realigned), mov rbx, rsp (save off the original sp), and sub rsp, -16 (realign).
2. It is probably a good idea to align the stack generally, if you plan to access items placed on the stack prior to your program start (such as ARGC and ARGV).
Otherwise, it's not ... required...
Does this sound reasonable?
-
The reason why, no x86-64 mode, the stack usually is aligned to DQWORD (16 bytes) is because SSE. Since SSE/SSE2 is available to every x86-64 capable Intel/AMD microprocessor, floating point operations will use, by default, SSE/SSE2 and movaps instructions requires DQWORD alignment (otherwise you'll get an General Protection Fault).
So, yep, it is useful to keep the stack aligned by DQWORD.