Author Topic: NASM-X prologue/epilogue optimizations  (Read 31059 times)

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
NASM-X prologue/epilogue optimizations
« on: February 27, 2014, 12:10:16 AM »
So I've had some optimized code for NASM-X sitting idle in preparation for the next release that I've decided to check in.  Specifically, the prologue and  epilogue routines are more aware of what the user has defined.  They now attempt to avoid emitting extraneous opcodes under certain scenarios.

For example:
Code: [Select]
proc myproc
locals none
    xor  __AX, __AX   ; <- this builds in both 32 and 64-bit environments ( portability! )
endproc

logically creates the following procedure definition (if 64-bit assembled):
Code: [Select]
myproc:
    push rbp
    xor  rax, rax
    pop  rbp
    ret

Seeing as this proc definition does not have parameters, nor are there local variables, we now optimize out the "mov ebp, esp" as there are no equates built for offsets from ebp.  The rbp push/pop pair is still required to maintain proper stack alignment thus there is no need of "add rsp, xxx" either.  However, in order to maintain alignment, you may still encounter what would at first glance appear to be extra "add rsp, XX" if you have an odd number of registers passed to USES.

For such a simple example it would seem trivial to optimize out the push/pop for the alignment requirement.  However, we don't know if INVOKE will be called sometime later in the procedure, so we play it safe.  Perhaps by using the assembler pass sequence and performing fix-ups in a different pass we could optimize further.  I'd be open to suggestions here but it seems like a lot of extra work for such small procedures that would probably be better for the developer to hand optimize anyways.

One of the bigger changes is that, under the covers, the Linux 64-bit spill area has been shifted to after the USES register-store.  This had been a sore spot for me for quite some time as this area represented nothing more than temporary space similar to local variables.  Doing this enables us to redefine the prologue format across all platforms back to what you're probably used to: using push ( rather than mov ) in the prologue when USES is encountered.  This obviously helps in reducing procedure size and should speed up execution even if slightly.

Please download a snapshot and give it a workout for your particular environment(s).  I'm interested in any bugs you may find or possible improvements.  Post 'em here - and thank you!

ps: I also fixed a few demos and updated the html docs a bit.

edit: clarity
« Last Edit: February 27, 2014, 02:26:28 AM by Rob Neff »

Offline encryptor256

  • Full Member
  • **
  • Posts: 250
  • Country: lv
  • Win64 .
    • On Youtube: encryptor256
Re: NASM-X prologue/epilogue optimizations
« Reply #1 on: February 27, 2014, 09:13:10 AM »
Hey,
nice topic,
now i can sing the old song again, x64! :D

If procedure have arguments,
then it must save arguments
in shadow space without a request.

But - how it is now?

Right now arguments are saved
by invoke macro, which is incorrect.
Caller don't have to touch procedures shadow space.

MSDN Resource: Prolog and Epilog

"The prolog saves argument registers in their home addresses if required"

If required, means, if procedure have arguments.

And again:
If procedure have arguments,
then it must save arguments
in shadow space without a request.

Yes, procedure macro and not invoke macro.

From the other side of view:
What's the point of this procedure,
if i have provided arguments,
but macro does not save arguments.
Code: [Select]
bits 64

%include "nasmx.inc"

proc myproc, ptrdiff_t arg0, ptrdiff_t arg1
locals none
xor rax,rax
endproc

NDisasm:
Code: [Select]
0000003C  55                push rbp
0000003D  4889E5            mov rbp,rsp
00000040  4831C0            xor rax,rax
00000043  5D                pop rbp
00000044  C3                ret

ps: I also fixed a few demos and updated the html docs a bit.

* Demo 17 still want's to be Demo 16.  :)

Bye.
Encryptor256's Investigation \ Research Department.

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: NASM-X prologue/epilogue optimizations
« Reply #2 on: February 27, 2014, 03:55:16 PM »
MSDN Resource: Prolog and Epilog

"The prolog saves argument registers in their home addresses if required"

If required, means, if procedure have arguments.

And again:
If procedure have arguments,
then it must save arguments
in shadow space without a request.

Who is in the better position of determining whether the registers should be saved or not - you the developer or the macros you use?  It is my opinion that the developer has the clearer understanding of what the requirements for his/her PROC are.  Thus, NASM-X creates the frame properly and gives you the option of whether you should save registers or not.

For Win64 programming you as the developer may not need to use the shadow space at all.  Why would you want multiple statements similar to "mov [ebp+ofst], rcx" automatically inserted into your procedure if you never intended on accessing that area anyways?  That could generate up to 4 additional mov operations per procedure.  For Linux x64 it would be even worse as you have access to many more registers for parameter passing.

The macro INVOKE provides you the opportunity to prefill shadow space before the call because you may be calling a non-conformant function in a shared library - a function that you didn't write or have source to.  See also this thread which discusses whether invoke should spill registers or not.

Let's look at your example:

From the other side of view:
What's the point of this procedure,
if i have provided arguments,
but macro does not save arguments.


Code: [Select]
bits 64

%include "nasmx.inc"

proc myproc, ptrdiff_t arg0, ptrdiff_t arg1
locals none
xor rax,rax
endproc

NDisasm:
Code: [Select]
0000003C  55                push rbp
0000003D  4889E5            mov rbp,rsp
00000040  4831C0            xor rax,rax
00000043  5D                pop rbp
00000044  C3                ret

This code actually provides a very good reason why NASM-X let's you decide whether to spill registers or not.  You've defined a proc that has parameters, thus NASM-X must "mov rbp, rsp" in order to allow you to access the parameter stack space later on.  Whether you actually access that space is not a NASM-X concern. 

In that example you don't make any reference to the stack at all.  Your procedure would then be better written as:
Code: [Select]
bits 64

%include "nasmx.inc"

proc myproc
locals none
xor rax,rax
endproc

In summary, I'd rather leave the control of register spilling to the developer.  We can, of course, add another pragma to turn automatic proc register spilling on or off.
I'll keep that on the discussion table for now for additional input and consideration.

ps: I also fixed a few demos and updated the html docs a bit.

* Demo 17 still want's to be Demo 16.  :)

Hah!  I forgot about that one, thanks! :)

Offline encryptor256

  • Full Member
  • **
  • Posts: 250
  • Country: lv
  • Win64 .
    • On Youtube: encryptor256
Re: NASM-X prologue/epilogue optimizations
« Reply #3 on: February 27, 2014, 04:43:34 PM »
Hehe, i don't know what to say. :D

Opinion collision... reboot.

See also this thread which discusses whether invoke should spill registers or not.

No, That "spill registers or not" was misunderstanding.

He said, in my words: "no mater how many arguments procedure have: 0,1,2,3,4,5,... It will always save those four register arguments."

If procedure have 0 arguments, it will save those four arguments anway.
If procedure have 1 arguments, it will save those four arguments anway.
If procedure have 2 arguments, it will save those four arguments anway.
If procedure have 3 arguments, it will save those four arguments anway.

This is how it should be:

If procedure have 0 arguments, it will save 0 regsiter arguments on stack.
If procedure have 1 arguments, it will save 1 regsiter arguments on stack.
If procedure have 2 arguments, it will save 2 regsiter arguments on stack.
If procedure have 3 arguments, it will save 3 regsiter arguments on stack.
And so on...

After four arguments there is no need to save anything more, because every argument above 4x is passed via stack.

The thing what does invoke macro must do proc macro.

Hello, echo? If i have provided two arguments via proc macro - of course i want them to be saved in shadow space.

Somebody would like to:
Build applications for speed? - Macros? No! Go back to square one, assembly - no macros.

Macros are for: insertion of  changeable code patterns with small modifications.


:)
Encryptor256's Investigation \ Research Department.

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: NASM-X prologue/epilogue optimizations
« Reply #4 on: February 27, 2014, 05:07:57 PM »
If i have provided two arguments via proc macro - of course i want them to be saved in shadow space.

Somebody would like to:
Build applications for speed? - Macros? No! Go back to square one, assembly - no macros.

Macros are for: insertion of  changeable code patterns with small modifications.

:)

OK.  I think I understand what you're saying.  Basically, the simple act of defining the PROC with parameters means that you will be accessing the spill area.  Otherwise, if you weren't planning on doing that then you would define a simple PROC with zero parameters.

That seems reasonable for a PROC with 4 or less arguments.  But what about a PROC with 5 or more arguments where we may or may not want to spill registers but we need the stack frame built and param offsets equated in order to access the 5th, 6th, 7th, or 8th parameter(s)?

Offline encryptor256

  • Full Member
  • **
  • Posts: 250
  • Country: lv
  • Win64 .
    • On Youtube: encryptor256
Re: NASM-X prologue/epilogue optimizations
« Reply #5 on: February 27, 2014, 06:26:09 PM »
This sentence seems absurd to me:
Basically, the simple act of defining the PROC with parameters means that you will be accessing the spill area.

Well of course i'm going to access the spill area, anyone would, because there(in spill area, shadow space) lies first four arguments.

Spill area / Shadow space is 32 bytes long and even more, if procedure have more than four arguments.


Otherwise, if you weren't planning on doing that then you would define a simple PROC with zero parameters.

Yes, generally, no one needs to access spill space/shadow space, if procedure doesn't have any arguments.


But what about a PROC with 5 or more arguments where we may or may not want to spill registers but we need the stack frame built and param offsets equated in order to access the 5th, 6th, 7th, or 8th parameter(s)?

Yes, and parameters/arguments are stack/base frame offset's.

Stack space is already allocated by the caller.
If your procedure have 13 arguments, then caller must allocate this space to store those arguments and have a valid call to this procedure who have 13 arguments.
If your procedure have 13 arguments, caller allocates 13 * 8 + stack alignment space.
If your procedure have 13 arguments:
1. first four arguments are passed via registers and, on procedure side they are save into spill space/shadow space/register home address space.
1.1. !!! If procedure will not save these first four arguments they might be lost due to of call of internal procedure or else.
2. other 9 arguments are already saved on stack.

+ Yes there is a need for stack frame pointer, like RBP.

Later on, in procedure that had those 13 arguments:
First argument is accessed via RBP+8*2.
Second argument is accessed via RBP+8*3.
Third argument is accessed via RBP+8*4.
And so on...

Basic idea:

Catch idea or my insanity(it seems, im the only one who understands what im talking about). :D

Did i ever tell you the definition of "Insanity"??

:D

x64 -> Macro idea versus plain code.

1. Macro procedure example:
Code: [Select]
; *******************************************
;
; Procedure: testProcA
; argument: ptrMessage
; argument: ptrTitle
;
; Procedure takes two arguments and call's MessageBox.
;
;
proc testProc,ptrdiff_t ptrMessage,ptrdiff_t ptrTitle
    locals none

invoke MessageBox,ptrdiff_t [argv(.ptrMessage)],ptrdiff_t [argv(.ptrTitle)],0

xor rax,rax
endproc

2. Plain code procedure example:
Code: [Select]
; *******************************************
;
; Procedure: testProcA
; argument: ptrMessage
; argument: ptrTitle
;
; Procedure takes two arguments and call's MessageBox.
;
align 16
testProcA:

;
; Save arguments
;
mov [rsp+8*1],rcx
mov [rsp+8*2],rdx

%define ptrMessage rbp+8*2
%define ptrTitle rbp+8*3

;
; Create stack
;
push rbp
mov rbp,rsp
lea rsp,[rsp-8*4]


xor rcx,rcx
mov rdx,[ptrMessage]
mov r9,[ptrTitle]
xor r8,r8
call MessageBoxA

xor rax,rax

.quit:
;
; Clear stack n quit
;
lea rsp,[rsp+8*4]
pop rbp
ret

So, procedure testProcA, takes two arguments, then i save them into spill space/shadow space/home location.
Later on, i use "ptrMessage" to access first argument and so on.

And by the way, this is only my suggestion.
At least that part felt so wrong, when invoke macro,
tried to store first four arguments into stack before a call is made.

At procedure, procedure code is responsible to store/save first four arguments registers into spill space/shadow space/home location.
And that's it.
Bye,
Encryptor256.
Encryptor256's Investigation \ Research Department.

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: NASM-X prologue/epilogue optimizations
« Reply #6 on: February 27, 2014, 07:05:17 PM »
What you've just described is exactly how NASM-X already works - with the exception of automatically saving register parameters to the shadow space in the prologue.

So I see a few options for PROC with parameter arguments:
  • automatically save registers to shadow space
  • automatically save registers to shadow space UNLESS some pragma has disabled the feature.
  • only save registers to shadow space if some pragma has enabled the feature
  • force the developer to save to the shadow space if s/he requires it.  ( <-- this is what NASM-X currently does )

I already understand which option you personally prefer.  ;)

Offline encryptor256

  • Full Member
  • **
  • Posts: 250
  • Country: lv
  • Win64 .
    • On Youtube: encryptor256
Re: NASM-X prologue/epilogue optimizations
« Reply #7 on: February 28, 2014, 07:25:28 AM »
What you've just described is exactly how NASM-X already works - with the exception of automatically saving register parameters to the shadow space in the prologue.

So I see a few options for PROC with parameter arguments:
  • automatically save registers to shadow space
  • automatically save registers to shadow space UNLESS some pragma has disabled the feature.
  • only save registers to shadow space if some pragma has enabled the feature
  • force the developer to save to the shadow space if s/he requires it.  ( <-- this is what NASM-X currently does )

I already understand which option you personally prefer.  ;)

Yes, "automatically save registers into shadow space".
And
if there are 2x arguments, then save only those two.
if there are 3x arguments, then save only those three.

User "miz" from this thread Typo in nasmx.inc and issue with invoke on win64, well "he said": "no mater how arguments procedure have, it will always save those four arguments saved, even if procedure have only two or maybe even none".

See also this thread which discusses whether invoke should spill registers or not.

"spill registers or not" - No, "yes or no / turn on or turn off", but how many registers to save.

- "he didn't said to turn them on or off. He said, if procedure have zero arguments then it saves those argument registers anyway".

This should be pragma option,
enabled by default: "saving register parameters to the shadow space in the prologue."
If there are 2x arguments, then save only those two.
If there are 3x arguments, then save only those three.

This should be a pragma option,
disabled by default : "force the developer to save to the shadow space if s/he requires it.  ( <-- this is what NASM-X currently does )".

"( <-- this is what NASM-X currently does )"

Wrong,
New comer will not know that he have to save arguments manually.



"What you've just described is exactly how NASM-X already works"

Yes, and it works wrong.
Which part? This invoke macro part:
Code: [Select]
xor rcx,rcx
mov [rsp],rcx
call ExitProcess

That is not Win64 calling convention.
Invoke macro doesn't have to save first four arguments on stack, it is procedures duty.
First four arguments are passed via registers.
Option: "FASTCALL_STACK_PRELOAD", that loads first four registers into stack, before a call is made, is wrong on win64.

Let's say, i design and call ExitProcess, this is how it should be:

Design:
Code: [Select]
ExitProcess:

;
; Procedure have one argument, so
        ; Save arguments - only one
;
mov [rsp+8*1],rcx


;
; ... Other code
;

ret

Call:
Code: [Select]
xor rcx,rcx
call ExitProcess

First four arguments must be saved on procedure part, but NOT on invoke part.


Offtopic: Documentation of pragma option FASTCALL_STACK_PRELOAD
New comer might struggle with this, to find out, which one is default, ENABLE or DISABLE.



Okay, fine, there is no need to drink hot water here,
it seems NASMX users are satisfied with the way - how it works
and im just here trying to suggest something that doesn't matters anyway. :)

Bye,
Encryptor256.
Encryptor256's Investigation \ Research Department.

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: NASM-X prologue/epilogue optimizations
« Reply #8 on: February 28, 2014, 05:14:11 PM »
I certainly agree that newcomers will not fully understand convention requirements.  I'll admit that sometimes I'm too close to the details and thus tend to look at issues from the position of an expert user.  I need to remember to come up for some air and see what some of the additional user issues lurking there, waiting to bite back.

So, moving the stack pre-load from INVOKE to PROC would solve many potential issues there.  However, again from the point of an expert user, there ARE situations that warrant disabling or enabling pre-loading at any point within the source code.  Thus I want pre-load pragma options for both PROC and INVOKE macros so that more advanced users can use it to over-ride default behavior.  For one or more procedures I may need to turn on pre-loading during an INVOKE or turn off pre-loading in the PROC prologue.

Thus the defaults should work for the majority of cases but we would also have an "escape" mechanism if we need it in order to alter default behavior.
I'll work on it this week-end and hopefully have something for you to play with by Sunday.


Offline encryptor256

  • Full Member
  • **
  • Posts: 250
  • Country: lv
  • Win64 .
    • On Youtube: encryptor256
Re: NASM-X prologue/epilogue optimizations
« Reply #9 on: February 28, 2014, 06:15:20 PM »
I certainly agree that newcomers will not fully understand convention requirements.  I'll admit that sometimes I'm too close to the details and thus tend to look at issues from the position of an expert user.  I need to remember to come up for some air and see what some of the additional user issues lurking there, waiting to bite back.

So, moving the stack pre-load from INVOKE to PROC would solve many potential issues there.  However, again from the point of an expert user, there ARE situations that warrant disabling or enabling pre-loading at any point within the source code.  Thus I want pre-load pragma options for both PROC and INVOKE macros so that more advanced users can use it to over-ride default behavior.  For one or more procedures I may need to turn on pre-loading during an INVOKE or turn off pre-loading in the PROC prologue.

Thus the defaults should work for the majority of cases but we would also have an "escape" mechanism if we need it in order to alter default behavior.
I'll work on it this week-end and hopefully have something for you to play with by Sunday.



Sounds nice!

hopefully have something for you to play

Well, there is not a lot to play with, i know how it should be. :D

Hey, maybe there is no need to change - anything.
Only one who complains, was, user ankiller with his topic BUG of "USES MARCO".
And that bug is\was fixed.

If no one complains, then why change something. Users are satisfied with current release, no new glitches, and so on.

There is other things to work on: What new comers would like to?
-  I think they all would like ta have some cross platform IDE, NASM/X IDE, Code Designer, Visual Studio. :D

Encryptor256's Investigation \ Research Department.