NASM-X prologue/epilogue optimizations

Related Projects > NASMX

(1/2) > >>

Rob Neff:
So I've had some optimized code for NASM-X sitting idle in preparation for the next release that I've decided to check in. Specifically, the prologue and epilogue routines are more aware of what the user has defined. They now attempt to avoid emitting extraneous opcodes under certain scenarios.

For example:

--- Code: ---proc myproc
locals none
xor __AX, __AX ; <- this builds in both 32 and 64-bit environments ( portability! )
endproc

--- End code ---

logically creates the following procedure definition (if 64-bit assembled):

--- Code: ---myproc:
push rbp
xor rax, rax
pop rbp
ret

--- End code ---

Seeing as this proc definition does not have parameters, nor are there local variables, we now optimize out the "mov ebp, esp" as there are no equates built for offsets from ebp. The rbp push/pop pair is still required to maintain proper stack alignment thus there is no need of "add rsp, xxx" either. However, in order to maintain alignment, you may still encounter what would at first glance appear to be extra "add rsp, XX" if you have an odd number of registers passed to USES.

For such a simple example it would seem trivial to optimize out the push/pop for the alignment requirement. However, we don't know if INVOKE will be called sometime later in the procedure, so we play it safe. Perhaps by using the assembler pass sequence and performing fix-ups in a different pass we could optimize further. I'd be open to suggestions here but it seems like a lot of extra work for such small procedures that would probably be better for the developer to hand optimize anyways.

One of the bigger changes is that, under the covers, the Linux 64-bit spill area has been shifted to after the USES register-store. This had been a sore spot for me for quite some time as this area represented nothing more than temporary space similar to local variables. Doing this enables us to redefine the prologue format across all platforms back to what you're probably used to: using push ( rather than mov ) in the prologue when USES is encountered. This obviously helps in reducing procedure size and should speed up execution even if slightly.

Please download a snapshot and give it a workout for your particular environment(s). I'm interested in any bugs you may find or possible improvements. Post 'em here - and thank you!

ps: I also fixed a few demos and updated the html docs a bit.

edit: clarity

encryptor256:
Hey,
nice topic,
now i can sing the old song again, x64! :D

If procedure have arguments,
then it must save arguments
in shadow space without a request.

But - how it is now?

Right now arguments are saved
by invoke macro, which is incorrect.
Caller don't have to touch procedures shadow space.

MSDN Resource: Prolog and Epilog

"The prolog saves argument registers in their home addresses if required"

If required, means, if procedure have arguments.

And again:
If procedure have arguments,
then it must save arguments
in shadow space without a request.

Yes, procedure macro and not invoke macro.

From the other side of view:
What's the point of this procedure,
if i have provided arguments,
but macro does not save arguments.

--- Code: ---bits 64

%include "nasmx.inc"

proc myproc, ptrdiff_t arg0, ptrdiff_t arg1
locals none
xor rax,rax
endproc

--- End code ---

NDisasm:

--- Code: ---0000003C 55 push rbp
0000003D 4889E5 mov rbp,rsp
00000040 4831C0 xor rax,rax
00000043 5D pop rbp
00000044 C3 ret

--- End code ---

--- Quote from: Rob Neff on February 27, 2014, 12:10:16 AM ---ps: I also fixed a few demos and updated the html docs a bit.

--- End quote ---

* Demo 17 still want's to be Demo 16. :)

Bye.

Rob Neff:

--- Quote from: encryptor256 on February 27, 2014, 09:13:10 AM ---MSDN Resource: Prolog and Epilog

"The prolog saves argument registers in their home addresses if required"

If required, means, if procedure have arguments.

And again:
If procedure have arguments,
then it must save arguments
in shadow space without a request.

--- End quote ---

Who is in the better position of determining whether the registers should be saved or not - you the developer or the macros you use? It is my opinion that the developer has the clearer understanding of what the requirements for his/her PROC are. Thus, NASM-X creates the frame properly and gives you the option of whether you should save registers or not.

For Win64 programming you as the developer may not need to use the shadow space at all. Why would you want multiple statements similar to "mov [ebp+ofst], rcx" automatically inserted into your procedure if you never intended on accessing that area anyways? That could generate up to 4 additional mov operations per procedure. For Linux x64 it would be even worse as you have access to many more registers for parameter passing.

The macro INVOKE provides you the opportunity to prefill shadow space before the call because you may be calling a non-conformant function in a shared library - a function that you didn't write or have source to. See also this thread which discusses whether invoke should spill registers or not.

Let's look at your example:

--- Quote from: encryptor256 on February 27, 2014, 09:13:10 AM ---From the other side of view:
What's the point of this procedure,
if i have provided arguments,
but macro does not save arguments.

--- End quote ---

--- Quote from: encryptor256 on February 27, 2014, 09:13:10 AM ---
--- Code: ---bits 64

%include "nasmx.inc"

proc myproc, ptrdiff_t arg0, ptrdiff_t arg1
locals none
xor rax,rax
endproc

--- End code ---

NDisasm:

--- Code: ---0000003C 55 push rbp
0000003D 4889E5 mov rbp,rsp
00000040 4831C0 xor rax,rax
00000043 5D pop rbp
00000044 C3 ret

--- End code ---

--- End quote ---

This code actually provides a very good reason why NASM-X let's you decide whether to spill registers or not. You've defined a proc that has parameters, thus NASM-X must "mov rbp, rsp" in order to allow you to access the parameter stack space later on. Whether you actually access that space is not a NASM-X concern.

In that example you don't make any reference to the stack at all. Your procedure would then be better written as:

--- Code: ---bits 64

%include "nasmx.inc"

proc myproc
locals none
xor rax,rax
endproc

--- End code ---

In summary, I'd rather leave the control of register spilling to the developer. We can, of course, add another pragma to turn automatic proc register spilling on or off.
I'll keep that on the discussion table for now for additional input and consideration.

--- Quote from: encryptor256 on February 27, 2014, 09:13:10 AM ---
--- Quote from: Rob Neff on February 27, 2014, 12:10:16 AM ---ps: I also fixed a few demos and updated the html docs a bit.

--- End quote ---

* Demo 17 still want's to be Demo 16. :)

--- End quote ---

Hah! I forgot about that one, thanks! :)

encryptor256:
Hehe, i don't know what to say. :D

Opinion collision... reboot.

--- Quote from: Rob Neff on February 27, 2014, 03:55:16 PM ---See also this thread which discusses whether invoke should spill registers or not.

--- End quote ---

No, That "spill registers or not" was misunderstanding.

He said, in my words: "no mater how many arguments procedure have: 0,1,2,3,4,5,... It will always save those four register arguments."

If procedure have 0 arguments, it will save those four arguments anway.
If procedure have 1 arguments, it will save those four arguments anway.
If procedure have 2 arguments, it will save those four arguments anway.
If procedure have 3 arguments, it will save those four arguments anway.

This is how it should be:

If procedure have 0 arguments, it will save 0 regsiter arguments on stack.
If procedure have 1 arguments, it will save 1 regsiter arguments on stack.
If procedure have 2 arguments, it will save 2 regsiter arguments on stack.
If procedure have 3 arguments, it will save 3 regsiter arguments on stack.
And so on...

After four arguments there is no need to save anything more, because every argument above 4x is passed via stack.

The thing what does invoke macro must do proc macro.

Hello, echo? If i have provided two arguments via proc macro - of course i want them to be saved in shadow space.

Somebody would like to:
Build applications for speed? - Macros? No! Go back to square one, assembly - no macros.

Macros are for: insertion of changeable code patterns with small modifications.

:)

Rob Neff:

--- Quote from: encryptor256 on February 27, 2014, 04:43:34 PM ---If i have provided two arguments via proc macro - of course i want them to be saved in shadow space.

Somebody would like to:
Build applications for speed? - Macros? No! Go back to square one, assembly - no macros.

Macros are for: insertion of changeable code patterns with small modifications.

:)

--- End quote ---

OK. I think I understand what you're saying. Basically, the simple act of defining the PROC with parameters means that you will be accessing the spill area. Otherwise, if you weren't planning on doing that then you would define a simple PROC with zero parameters.

That seems reasonable for a PROC with 4 or less arguments. But what about a PROC with 5 or more arguments where we may or may not want to spill registers but we need the stack frame built and param offsets equated in order to access the 5th, 6th, 7th, or 8th parameter(s)?

Navigation

[0] Message Index

[#] Next page

Go to full version