Disclaimer:
The following contains "idea code" and as such should not be copy/pasted for use in production systems!
It is semi-random yet provides insight of issues currently being addressed.
Also: Warning - rants ahead!
Imagine, if you will, you are trying to encapsulate 64-bit fastcall calling conventions of Linux and Windows using macros, oh, say, like NASMX.
On Windows - function parameters are passed in using RCX,RDX,R8,R9 for ints/ptrs and xmm0-3 for float types.
On Linux - parameters are passed in RDI, RSI, RDX, RCX, R8, R9 for ints/ptrs and xmm0-7 for float types.
Now, Windows REQUIRES a register storage area on the stack frame above the function return address which will reserve space for the registers used as parameters to "spill" into if needed following a function call. This allows you to reference (ie: [rbp+16]) from your code to store(spill) the register into if needed or read from it. When CALLING a function you must provide for this storage area PRIOR to calling (ie: sub rsp,32) after pushing any remaining parameters exceeding the registers allocated.
One trick is to figure out from within the function the call that has the most parameters and use that count ( ie: sub rsp,count * 8 ). HOWEVER, when trying to define the stack frame prologue BEFORE you encounter all calls made within the function makes it impossible to use this technique.
proc myproc
; sub rsp,??? ; myproc can't know this yet: ( func3 has 6 params ( 6 * 8 ) = sub rsp,48 )
; must do this instead...arrrg..looks like debug code
invoke func1, qword a, qword b, qword c, qword d
sub rsp, 32
; put args into regs
call func1
add rsp, 32
invoke func2, qword a, qword b, qword c, qword d, qword e
sub rsp, 40
; put args into regs and stack
call func2
add rsp, 40
invoke func3, qword a, qword b, qword c, qword d, qword e, qword f
sub rsp, 48
call func3
add rsp,48
; add rsp,??? ; would be nice to only have this outer frame!
endproc
Linux has a spill area located below the frame pointer (ie: [rbp-8]) or based from RSP (ie: [rsp+40] if you'd rather make RBP available as a general register). It doesn't use Windows register shadow space convention but has it's own convention (naturally).
Both systems use different conventions when providing parameters in registers.
Let's use an example:
int myfunc(char* p, int x, double d, int z, float f);
Assume a standard stack frame prologue:
push rbp
mov rbp, rsp
Windows will store the params as:
RCX - p
RDX - x
xmm2 - d
R9 - z
[rbp+48] - f
It's easy to set up and define offsets from the frame based on the current arg count:
%assign %$frame_offset 16
%rep %$argcount
%$argname EQU %$frame_offset ; assign offset
%assign %$frame_offset 8+%$frame_offset
%endrep
; spill first register
mov [rbp+p], rcx ; positive offset from rbp
Linux, correct me if I'm wrong, does this:
RDI - p
RSI - x
xmm0 - d
RDX - z
xmm1 - f
Keeping in mind that no args have been pushed to the stack the spill area is defined below the frame pointer:
%assign %$frame_offset 0
%rep %$argcount
%assign %$frame_offset 8+%$frame_offset
%$argname EQU %$frame_offset ; assign offset
%endrep
; spill first register
mov [rbp-p], rdi ; negative offset from rbp
So, spill areas are different, no big deal. HOWEVER (you knew it was coming, right?) let's see what the following little prototype deals us:
// btw - don't let me catch you doing this :p
int myfunc( int a, int b, int c, int d, int e, int f, int g, int h );
Spill Area
Windows:
[rbp+72] = h
[rbp+64] = g
[rbp+56] = f
[rbp+48] = e
R9 = d ; [rbp+40]
R8 = c ; [rbp+32]
RDX = b ; [rbp+24]
RCX = a ; [rbp+16]
Linux:
RDI = a ; [rbp-8]
RSI = b ; [rbp-16]
RDX = c ; [rbp-24]
RCX = d ; [rbp-32]
R8 = e ; [rbp-32]
R9 = f ; [rbp-40]
[rbp+16] = g ; <-- ack!!!
[rbp+24] = h ; <-- ack!!!
Yes, that's right, when out of registers THEN the stack frame located ABOVE the frame pointer is used, beginning as if a normal cdecl calling convention from param 1 (ie: rbp+16])!!
Now account for how your users will call your macro:
%ifidni __OUTPUT_FORMAT__,win64
%define argv(v) rbp+v
%elifidni __OUTPUT_FORMAT__,elf64
%define argv(v) rbp-v
%endif
mov rcx, [argv(f)]
mov rdx, [argv(g)]
Yep, broken. Best way to fix? Require your linux users to not exceed allocated registers? Bleh, but workable.
I like Windows system of maintaining parameter order according to cdecl, but I like Linux register allocation.
The trials and tribulations of software engineering. What to do, oh what to do...