Author Topic: Things to do in _start  (Read 785 times)

Offline decuser

  • Jr. Member
  • *
  • Posts: 11
  • Country: us
Things to do in _start
« on: March 07, 2024, 04:59:02 AM »
I see a lot of example code out there that has _start like this:

Code: [Select]
push rbp
mov rbp, rsp
and rsp, -16

and:
Code: [Select]
push rsp
mov rbp, rsp
nop

and even just:
Code: [Select]
nop
What's going on? Is there some reason for this stuff? It's not part of the main logic, it seems like it's some kind of setup, but I can't make sense of it. I have heard about stack alignment, maybe this is something to do with that, if so WTF? What do I need to have at the beginning of my code?

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: Things to do in _start
« Reply #1 on: March 07, 2024, 12:43:30 PM »
There is no reason for this stuff... Well... not this way and not always.

If you are creating what I call a pseudo-assembly code (creating a C program, using C Runtime and libc in asm), then you must obey the ABI (MS-ABI or SysV ABI). This means RSP must be aligned by DQWORD (16 bytes). main() has a misaligned RSP, so you must do:
Code: [Select]
  global main
main:
  sub rsp,8    ; align RSP
  ...
  add rsp,8    ; restore RSP before returning
  xor  eax,eax  ; return 0;
  ret
Of course, using libc in _start isn't a good idea (you'll need to initialize the library, the C Runtime).

In _start, on Windows, if you are using Win32 API, you must align RSP and reserve space to shadow space:
Code: [Select]
_start:
  sub  rsp,8+32   ; align RSP and reserve space for shadow space.
  ...
  ; Don't need to restore RSP here...
  xor  ecx,ecx
  jmp  [__imp_ExitProcess]

On SysV ABI (Linux, etc) it is garanteed that RSP will be aligned to DQWORD on _start entry. On MS-ABI it isn't!
[]s
Fred
« Last Edit: March 07, 2024, 12:48:15 PM by fredericopissarra »

Offline decuser

  • Jr. Member
  • *
  • Posts: 11
  • Country: us
Re: Things to do in _start
« Reply #2 on: March 07, 2024, 01:20:13 PM »
That makes sense. When  you say, main() has a misaligned RSP, how do you know? Is it because of the way nasm puts the binary together? I am doing pure Linux at the moment and using syscalls with _start, not doing the pseudo C, so I gather from what you're saying that I don't need the prolog. But, I will once I switch to main for the pseudo C stuff, so I'm curious how you know  :).

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: Things to do in _start
« Reply #3 on: March 07, 2024, 04:18:50 PM »
That makes sense. When  you say, main() has a misaligned RSP, how do you know? Is it because of the way nasm puts the binary together? I am doing pure Linux at the moment and using syscalls with _start, not doing the pseudo C, so I gather from what you're saying that I don't need the prolog. But, I will once I switch to main for the pseudo C stuff, so I'm curious how you know  :).

It is described in the ABIs. And you can see for yourself:
Code: [Select]
; test_win64.asm
;
;   c:\work> nasm -fwin64 -o test_win64.o test_win64.asm
;   c:\work> ld -s -o test_win64.exe test_win64.o -lkernel32
;
  bits  64
  default rel

  section .data

buffer:
  db    '0x'
  times 16 db '0'
  db    `\n`
bufferLength equ $ - buffer

  section .text

  extern __imp_GetStdHandle
  extern __imp_WriteConsoleA
  extern __imp_ExitProcess

  global _start
_start:
  ; sub rsp,8       ; align to DQWORD, if needed
 
  mov   rdx,rsp
  lea   rcx,[buffer+2]
  call  u64toStr

  mov   rcx,-11
  call  [__imp_GetStdHandle]
 
  mov   rcx,rax
  lea   rdx,[buffer]
  mov   r8d,bufferLength
  xor   r9,r9
  push  r9
  call  [__imp_WriteConsoleA]

  xor   ecx,ecx
  jmp   [__imp_ExitProcess]
 
; Destroys RAX, RDX and RDI.
  align 4
u64toStr:
  lea   rdi,[rcx+15]
  jmp   .test
.loop:
  mov   rax,rdx
  and   al,0x0f
  add   al,'0'
  cmp   al,'9'
  jbe   .skip
  add   al,7
.skip:
  mov   [rdi],al
  shr   rdx,4
  dec   rdi
.test:
  cmp   rdi,rcx
  jae   .loop
  ret
  bits  64
  default rel

  section .data

buffer:
  db    '0x'
  times 16 db '0'
  db    `\n`
bufferLength equ $ - buffer

  section .text

  extern __imp_GetStdHandle
  extern __imp_WriteConsoleA
  extern __imp_ExitProcess

  global _start
_start:
  ; sub rsp,8       ; align to DQWORD, if needed
 
  mov   rdx,rsp
  lea   rcx,[buffer+2]
  call  u64toStr

  mov   rcx,-11
  call  [__imp_GetStdHandle]
 
  mov   rcx,rax
  lea   rdx,[buffer]
  mov   r8d,bufferLength
  xor   r9,r9
  push  r9
  call  [__imp_WriteConsoleA]

  xor   ecx,ecx
  jmp   [__imp_ExitProcess]
 
; Destroys RAX, RDX and RDI.
  align 4
u64toStr:
  lea   rdi,[rcx+15]
  jmp   .test
.loop:
  mov   rax,rdx
  and   al,0x0f
  add   al,'0'
  cmp   al,'9'
  jbe   .skip
  add   al,7
.skip:
  mov   [rdi],al
  shr   rdx,4
  dec   rdi
.test:
  cmp   rdi,rcx
  jae   .loop
  ret
Code: [Select]
; test_sysv.asm
;
;    $ nasm -felf64 -o test_sysv.o test_sysv.asm
;    $ ld -s -o test_sysv test_sysv.o
;
  bits  64
  default rel

  section .data

buffer:
  db    '0x'
  times 16 db '0'
  db    `\n`
bufferLength equ $ - buffer

  section .text

  global _start
_start:
  ; NOTA: RSP já está alinhado por DQWORD (SysV ABI)!

  mov   rsi,rsp
  lea   rdi,[buffer+2]
  call  u64toStr

  mov   eax,1
  mov   edi,eax
  lea   rsi,[buffer]
  mov   edx,bufferLength
  syscall

  xor   edi,edi
  mov   eax,60
  syscall
 
  align 4
u64toStr:
  lea   rcx,[rdi+15]
  jmp   .test
.loop:
  mov   rax,rsi
  and   al,0x0f
  add   al,'0'
  cmp   al,'9'
  jbe   .skip
  add   al,7
.skip:
  mov   [rcx],al
  shr   rsi,4
  dec   rcx
.test:
  cmp   rcx,rdi
  jae   .loop
  ret
Running both of them (using MinGW64 for Windows):
Code: [Select]
c:\work> nasm -fwin64 -o test_win64.o test_win64.asm
c:\work> ld -s -o test_win64.exe test_win64.o -lkernel32
c:\work> test_win64
0x00000034601FF858
Code: [Select]
$ nasm -felf64 -o test_sysv.o test_sysv.asm
$ ld -s -o test_sysv test_sysv.o
$ ./test_sysv
0x00007FFD2BBF2D90
Notice the first 4 bits...

Anyway, see the SysV ABI for x86-64, here, topic 3.4.1 (in Stack State):

Quote
%rsp  "The stack pointer holds the address of the byte with lowest address which is part of
the stack. It is guaranteed to be 16-byte aligned at process entry."

On SysV ABI RSP points to argc, argv, envp, main-like arguments (see the ABI).

For MS-ABI you'll have to search the MSDN.
« Last Edit: March 07, 2024, 04:33:18 PM by fredericopissarra »

Offline decuser

  • Jr. Member
  • *
  • Posts: 11
  • Country: us
Re: Things to do in _start
« Reply #4 on: March 09, 2024, 07:56:59 PM »
After thinking it through based on what you've said and seeing a lot of FUD type posts elsewhere about this topic, I have come to the conclusion that:

1. Stack alignment in Linux x64 programming is required by the API (to 16 byte addresses) if you are calling glibc functions (using main, linking with gcc) because the caller address is pushed onto the stack prior to the call and that address is 8 bytes causing the stack to be misaligned by 8 bytes, hence the push rbx (now it's realigned), mov rbx, rsp (save off the original sp), and sub rsp, -16 (realign).

2. It is probably a good idea to align the stack generally, if you plan to access items placed on the stack prior to your program start (such as ARGC and ARGV).

Otherwise, it's not ... required...

Does this sound reasonable?

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: Things to do in _start
« Reply #5 on: March 09, 2024, 11:39:26 PM »
The reason why, no x86-64 mode, the stack usually is aligned to DQWORD (16 bytes) is because SSE. Since SSE/SSE2 is available to every x86-64 capable Intel/AMD microprocessor, floating point operations will use, by default, SSE/SSE2 and movaps instructions requires DQWORD alignment (otherwise you'll get an General Protection Fault).

So, yep, it is useful to keep the stack aligned by DQWORD.