NASM - The Netwide Assembler
NASM Forum => Using NASM => Topic started by: munair on November 14, 2021, 10:37:04 AM
-
I'm fairly new to NASM. I began using it more than a year ago for a 32bits compiler (sharpbasic.com).
Going over some 32 bits code, I found two ways to get the address of a string to be printed, i.e. both give the same output. What I used so far is:
' address of string buffer
emittl("mov ebx, _sb_buf11")
' convert integer to string (expect address in ebx)
emittl("call _sb_intstr")
' print
emittl("call _sb_print")
But to obtain the address of the string buffer LEA works as well:
emittl("lea ebx, dword [_sb_buf11]")
Which is the better approach?
-
It depends on the mode. In x86-64 mode LEA is smaller and with less colateral effects then MOV to obtain the address of an object in .data, .bss or .rodata sections because uses RIP relative addresses. This:
mov rax,x ; puts the address of x in RAX
lea rax,[x] ; same thing...
Both instructions have REX prefix because of their target (RAX), but the linear address of 'x' must be known in runtime and, since it is a 64 bits value, `mov rax,x` is encoded as a 10 bytes instruction (48 B8 for `mov eax`, plus 8 bytes for the address). 'LEA' is different here... '[ x ]' is usually encoded as a RIP relative address, so only 32 bits of the offset is encoded in the instruction and, since the address is relative to the address of the next instruction, no relocation is needed (the 'x' in MOV is encoded as a constant that must be added to the program base address, which is known only after the image is loaded... consider ASLR, for example!). 'LEA', here, is encoded only in 7 bytes, putting less pressure on L1I cache and internal reordering buffer of the processor (and prefetch queue).
So... LEA is preferable in x86-64 mode to get the address of static objects. In i386 mode (32 bits) it makes no difference.
-
Thanks a lot for your explanation fredericopissarra. It is very helpful.
BTW, I realized I started this topic in the wrong place. Any moderator is welcome to put it in the right place. ;)
-
Another tip: In i386 mode avoid using EBP as stack "base" pointer. This practice is common in real mode (16 bits) because only BP or BX can be used as base pointers in an effective address calculation. In 386 protected mode any register (but not EBP ou EFLAGS) can be used, so to access objects on the stack you can use ESP directly.
In i386 mode EAX, EBX, ECX, EDX, ESI, EDI and EBP can be used (depending on the calling convention). If you use EBP as a substitute to ESP than you ended with only 6 GPRs available for your routines, instead of 7.
So, the prologue:
push ebp
mov ebp,esp
and the epilogue:
pop ebp
ret
Can be (and should be) avoided.
Let's say you build a function taking two integers. Like this, in C:
int f(int a, int b) { return a + b; }
Instead of doing:
f:
push ebp
mov ebp,esp
mov eax,[ebp+8]
add eax,[ebp+12]
pop ebp
ret
You can write:
f:
mov eax,[esp+4]
add eax,[esp+8]
ret
-
That is a really great tip! And I should take that to the 64 bits version later on as well, I guess (I'm certainly not there yet).
-
Instead of doing:
f:
push ebp
mov ebp,esp
mov eax,[ebp+8]
add eax,[ebp+12]
pop ebp
ret
You can write:
f:
mov eax,[esp+4]
add eax,[esp+8]
ret
But while the stack base pointer is used to address parameters, the stack pointer is adjusted for local variables:
emittl("push ebp")
emittl("mov ebp, esp")
if n > 0 then // number of local variables
emittl("sub esp, " + str(n * 4))
end if
-
But while the stack base pointer is used to address parameters, the stack pointer is adjusted for local variables:
emittl("push ebp")
emittl("mov ebp, esp")
if n > 0 then // number of local variables
emittl("sub esp, " + str(n * 4))
end if
There is no need... You can, still, allocate local objects manipulating ESP directly. Suppose you have a function that needs a local DWORD. You can do:
struc fstk
.localvar: resd 1 ; local var.
.localsize:
.retaddr: resd 1 ; return address.
.arg1: resd 1 ; First argument.
endstruc
f:
; allocate local space on stack.
sub esp,fstk.localsize
mov eax,[esp+fstk.arg1] ; gets first arg
...
mov [esp+fstk.localvar],eax ; store in local var (.localvar is 0).
; deallocate local space on stack.
add esp,fstk.locaosize
ret
-
I understand that there is no need (actually there is, see my next post). BUT, the nice thing IMO about setting up a stack frame is that parameters are EBP+ while local variables are EBP- (with EBP-4 as the function result). For a compiler emitting asm code this looks easier to me and it's also easier address calculation lowering the stack pointer by [the number of local variables] * 4. To illustrate what I mean, have a look at the stack frame on this cheat sheet: https://www.cs.uaf.edu/2006/fall/cs301/support/x86/ (https://www.cs.uaf.edu/2006/fall/cs301/support/x86/)
-
Did you notice using a structure you don't need to remember where is the arguments or local stack allocated objects?
-
Did you notice using a structure you don't need to remember where is the arguments or local stack allocated objects?
It's not to remember but to make life simple. Giving your proposal of using the stack pointer without stack frame some thought, I think it is not a good idea for the simple reason that the stack pointer may change during the execution of a function. Suppose the code within a function calls another function and (local) variables are pushed on the stack to pass them as parameters. How do you access local variables after the stack pointer has changed? This is where the base pointer comes in. No matter what happens to the stack pointer, the base pointer makes sure that the offsets to local variables and parameters don't change.
-
I think it is not a good idea for the simple reason that the stack pointer may change during the execution of a function. Suppose the code within a function calls another function and (local) variables are pushed on the stack to pass them as parameters. How do you access local variables after the stack pointer has changed?
That's why compilers like C keeps track of (E|R)SP... Supose you have a funcion f() calling a function g(), each one with one argument, using i386 C callng convention:
int g(int x) { return x + x; }
int f(int x) { return g(x)+1; }
The generated code is something like this (without using EBP):
g:
mov eax,[esp+4]
add eax,eax
ret
f:
mov eax,[esp+4]
push eax
call g
add esp,4 ; stack cleanup. keeps in a known position.
inc eax
ret
Using the struct approach you'll always sure where arguments and local vars are for a function and don't need to remember the offsets. All you have to do is to pop the pushed argumentos from the stack after the function returns (or before...).
In some compilers (PASCAL, for instance) the responsability for stack cleanup is on the called function ("ret" accepts an argument for that)... The same come, in pascal:
g:
mov eax,[esp+4]
add eax,eax
ret 4
f:
mov eax,[esp+4]
push eax
call g
inc eax
ret 4
The point is: Why use prologue/epilogue nowadays? In the old pre-386 processors if you wanted to access data on stack you had two options only: Using POP and using EBP as base pointer in an effective address. It was not possible to use registers other than BP ou BX as base pointer, and other registers than SI or DI as index (and there were no 'scale'), so using BP was mandatory. After 386 this is not the case anymore.
-
What is the difference between the struct approach and the stack frame approach other than keeping R/EBP free? Maybe I misunderstand. I have seen examples of GCC (32bits) output producing stack frames, so it still seems a legitimate technique.
Currently in the SharpBASIC compiler the stack cleanup is done after the function call, as it was also messed up with the push instructions before the call. Seems more logical to me, but it's a matter of opinion. Here is a simple, stupid example I used to test function calls in the expression parser:
' SharpBASIC function
' -------------------
incl "lib/sys.sbi";
decl func five(n: int8): int8;
dim sum: int8;
main do
sum = five(5 * 5) + five(5 + 5);
print sum;
end;
func five(n: int8): int8
do
five = n;
end;
Generated asm code (without any code optimizations):
SECTION .text
global _start
global _end
_start:
movsx eax, byte [_C3]
push eax
movsx eax, byte [_C3]
pop edx
imul edx
push eax
call _I26
add esp, 4
push eax
movsx eax, byte [_C3]
push eax
movsx eax, byte [_C3]
pop edx
add eax, edx
push eax
call _I26
add esp, 4
pop edx
add eax, edx
; save sum
mov [_I27], al
; load sum
movsx eax, byte [_I27]
; print int
mov ebx, _sb_buf12
call _sb_intstr
call _sb_print
call _sb_printlf
_end:
mov ebx, 0
mov eax, 1
int 80h
_I26:
push ebp
mov ebp, esp
sub esp, 4
; init func five
mov byte [ebp - 4], 0
; load n
movsx eax, byte [ebp + 8]
; save func result five
mov [ebp - 4], al
._L0:
; load func result five
movsx eax, byte [ebp - 4]
mov esp, ebp
pop ebp
ret
extern _sb_intstr
extern _sb_print
extern _sb_printlf
extern _sb_buf12
SECTION .rodata
_C3 db 5
SECTION .bss
; define sum
_I27 resb 1
When you say Pascal compiler, which compiler do you mean? There are several of them, both commercial and free. Same goes for C compilers.
-
What is the difference between the struct approach and the stack frame approach other than keeping R/EBP free? Maybe I misunderstand. I have seen examples of GCC (32bits) output producing stack frames, so it still seems a legitimate technique.
Smaller and fastest code. Try to use -fomit-frame-pointer and -O2 options on GCC...
When you say Pascal compiler, which compiler do you mean? There are several of them, both commercial and free. Same goes for C compilers.
Turbo Pascal, Free Pascal and Delphi... And the old "pascal" calling convention used by C compilers back in the 90's, 2000's: Turbo C, Borland C++, MSC6, ...
-
When I get to optimization options for the compiler, fomitting the frame pointer will probably be one of them. Thanks again for the suggestion.