NASM - The Netwide Assembler
Related Projects => NASMX => Topic started by: ankiller on February 24, 2014, 07:28:41 AM
-
proc testProc,dword x,dword y
uses ebx,ecx,edx,esi
locals none
mov eax,1
endproc
OD:
00401034 55 push ebp
00401035 89E5 mov ebp,esp
00401037 895D FC mov dword ptr ss:[ebp-0x4],ebx
0040103A 894D F8 mov dword ptr ss:[ebp-0x8],ecx
0040103D 8955 F4 mov dword ptr ss:[ebp-0xC],edx
00401040 8975 F0 mov dword ptr ss:[ebp-0x10],esi
00401043 83EC 10 sub esp,0x10<--------------------------Here ESP is OK
00401046 B8 01000000 mov eax,0x1
0040104B 83C4 10 add esp,0x10<--------------------------Why add ESP?????
0040104E 5E pop esi ;pop what?
0040104F 5A pop edx ;pop what?
00401050 59 pop ecx ;pop what?
00401051 5B pop ebx ;pop what?
00401052 89EC mov esp,ebp
00401054 5D pop ebp
00401055 C2 0800 retn 0x8
-
Confirmed!, i see
ghosts the same:
Compile:
%include "nasmx.inc"
proc testProc,dword x,dword y
uses ebx,ecx,edx,esi
locals none
mov eax,1
endproc
With: "nasm -f win32 test.asm"
Disassemble with: "ndisasm -b 32 test.obj"
0000003C 55 push ebp
0000003D 89E5 mov ebp,esp
0000003F 895DFC mov [ebp-0x4],ebx
00000042 894DF8 mov [ebp-0x8],ecx
00000045 8955F4 mov [ebp-0xc],edx
00000048 8975F0 mov [ebp-0x10],esi
0000004B 83EC10 sub esp,byte +0x10
0000004E B801000000 mov eax,0x1
00000053 83C410 add esp,byte +0x10
00000056 5E pop esi
00000057 5A pop edx
00000058 59 pop ecx
00000059 5B pop ebx
0000005A 89EC mov esp,ebp
0000005C 5D pop ebp
0000005D C20800 ret 0x8
But it works, because of "mov esp,ebp".
"works" = "works" * 0.5 => LRESULT: Not really.
Error report
This code, print register EBX, before and after:
bits 32
%include "nasmx.inc"
section .data use32
align 4
txt: db 13,10,"Hello world: %d",0
section .text use32
global main
extern printf
align 4
proc testProc,dword x,dword y
uses ebx,ecx,edx,esi
locals none
mov eax,1
endproc
align 4
main:
push ebp
push ebx
mov ebp,esp
; Print current EBX
push ebx
push txt
call printf
add esp,8
; Call TESTPROC
push 0xFEEDBEEF
push 0xCAFEBABE
call testProc
; Print current EBX
push ebx
push txt
call printf
add esp,8
mov esp,ebp
pop ebx
pop ebp
ret
Produces output:
Hello world: 2130571264
Hello world: -17973521
Fix:
This code, print register EBX, before and after:
bits 32
%include "nasmx.inc"
section .data use32
align 4
txt: db 13,10,"Hello world: %d",0
section .text use32
global main
extern printf
align 4
;proc testProc,dword x,dword y
; uses ebx,ecx,edx,esi
; locals none
; mov eax,1
;endproc
testProc:
push ebp
mov ebp,esp
mov [ebp-0x4],ebx
mov [ebp-0x8],ecx
mov [ebp-0xc],edx
mov [ebp-0x10],esi
sub esp,byte +0x10
mov eax,0x1
pop esi
pop edx
pop ecx
pop ebx
mov esp,ebp
pop ebp
ret 0x8
align 4
main:
push ebp
push ebx
mov ebp,esp
; Print current EBX
push ebx
push txt
call printf
add esp,8
; Call TESTPROC
push 0xFEEDBEEF
push 0xCAFEBABE
call testProc
; Print current EBX
push ebx
push txt
call printf
add esp,8
mov esp,ebp
pop ebx
pop ebp
ret
Produces output:
Hello world: 2130571264
Hello world: 2130571264
Conclusion:
It works, but registers are not saved/restored like it should be.
It would not work, if a stack register, at procedures epilog, would not be restored.
By using "mov esp,ebp" could save nasty surprise for later, like it did now - gotta nasty surprise.
Avoiding of "mov esp,ebp" would show a clear error on the first run, but it didn't,
because, esp, was manually fixed/restored, so, error rised later, like we see, now.
Why:
It seem's it was a small mix with x64 "non-calling convention architecture".
You see, this is some basic x64 procedure model:
align 16
procedure:
; Prolog
; ...
sub rsp,0x28
; ...
; Epilog
; ...
add rsp,0x28
; ...
ret
Notify:
sub - allocate stack space at prolog,
add - deallocate stack space at epilog.
...
Yes, but it's for x64.
Final conclusion:
Lines of code:
- add esp,byte +0x10
- mov esp,ebp
Should be removed, then, it should work, like it should.
Offtopic, How to compress sentence: "Should be removed, then, it should work, like it should."
Like this:
Assign x = Should
Assign y = it
LRESULT: "X be removed, then, Y X work, like Y X."
Bye,
Encryptor256 @ Error Analysis Department. :D
-
What would happen if you put "locals none" first and "uses ..." after?
Best,
Frank
-
What would happen if you put "locals none" first and "uses ..." after?
Best,
Frank
Code:
%include "nasmx.inc"
proc testProc,dword x,dword y
locals none
uses ebx,ecx,edx,esi
mov eax,1
endproc
With: "nasm -f win32 test.asm", produced an error:
test.asm:5: fatal: (USES:8) uses directive must come before locals directive.
-
Confirm twice, i think, x32 is mixed with a x64 procedure model.
x64 Code:
bits 64
%include "nasmx.inc"
proc testProc,dword x,dword y
uses rbx,rcx,rdx,rsi
locals none
mov rax,1
endproc
x64 Compile with: "nasm -f win64 test.asm",
x64 Debug with: "ndisasm -b 64 test.obj"
x64 Result:
0000003C 55 push rbp
0000003D 4889E5 mov rbp,rsp
00000040 48895DF8 mov [rbp-0x8],rbx
00000044 48894DF0 mov [rbp-0x10],rcx
00000048 488955E8 mov [rbp-0x18],rdx
0000004C 488975E0 mov [rbp-0x20],rsi
00000050 4883EC20 sub rsp,byte +0x20
00000054 B801000000 mov eax,0x1
00000059 4883C420 add rsp,byte +0x20
0000005D 5E pop rsi
0000005E 5A pop rdx
0000005F 59 pop rcx
00000060 5B pop rbx
00000061 4889EC mov rsp,rbp
00000064 5D pop rbp
00000065 C3 ret
Nice easter egg, can this even work? :D
So,
rsp - 0x20 ; sub rsp,byte +0x20
rsp + 0x20 ; add rsp,byte +0x20
rsp + 0x08 ; pop rsi
rsp + 0x08 ; pop rdx
rsp + 0x08 ; pop rcx
rsp + 0x08 ; pop rbx
Stack undeflow or overflow?
It's called a stack undeflow.
Bye.
-
Okay, thanks for checking it, Encryptor256. I'd test it myself, but I haven't got NASMX installed (properly) at this point...
Best,
Frank
-
Logically, the USES macro moves the contents of registers to the procedure stack. We defer updating the stack offset until after the LOCALS macro since additional local procedure variables may be defined via LOCAL until we reach ENDLOCALS. That's the main reason why USES must come before LOCALS.
When specifying LOCALS NONE we should be simply offsetting the stack pointer to the correct aligned offset of total registers saved.
Finally, during ENDPROC, we unwind the stack and restore any previously saved registers. This scheme is used identically between x32 and x64 and accounts for the various calling conventions supported.
I'm not sure why you're seeing the insertion of add esp, 0x10 ( or, in encryptor's case, the add rsp, byte +20 ) line there. I'll investigate when I get some free time.
-
Okay, here is one more x32 test, notify "uses ebx":
Compile code with "nasm.exe -f win32 test.asm -o test.obj":
bits 32
%include "nasmx.inc"
proc testProc,dword x,dword y
uses ebx
locals none
mov eax,1
endproc
NDisasm with "ndisasm -b 32 test.obj":
0000003C 55 push ebp
0000003D 89E5 mov ebp,esp
0000003F 895DFC mov [ebp-0x4],ebx
00000042 83EC08 sub esp,byte +0x8
00000045 B801000000 mov eax,0x1
0000004A 83C404 add esp,byte +0x4
0000004D 5B pop ebx
0000004E 89EC mov esp,ebp
00000050 5D pop ebp
00000051 C20800 ret 0x8
Here is one more x64 test, notify "uses rbx":
Compile code with "nasm.exe -f win32 test.asm -o test.obj":
bits 64
%include "nasmx.inc"
proc testProc,dword x,dword y
uses rbx
locals none
mov rax,1
endproc
NDisasm with "ndisasm -b 64 test.obj":
0000003C 55 push rbp
0000003D 4889E5 mov rbp,rsp
00000040 48895DF8 mov [rbp-0x8],rbx
00000044 4883EC10 sub rsp,byte +0x10
00000048 B801000000 mov eax,0x1
0000004D 4883C408 add rsp,byte +0x8
00000051 5B pop rbx
00000052 4889EC mov rsp,rbp
00000055 5D pop rbp
00000056 C3 ret
Offtopic:
Path: "nasmx-1.3\demos\win32\DEMO17"
DEMO17 - Bitmaps and Timer example
Demo17 source code claims, that,
it want's to be Demo16,
but Demo16 already exists,
partial data section content of demo17.asm:
...
[section .data]
szTitle: declare(NASMX_TCHAR) NASMX_TEXT("Demo 16 - WinFloor"), 0x0
...
* Single byte change in window title is required.
Bye!
-
NDisasm with "ndisasm -b 64 test.obj":
0000003C 55 push rbp
0000003D 4889E5 mov rbp,rsp
00000040 48895DF8 mov [rbp-0x8],rbx
00000044 4883EC10 sub rsp,byte +0x10
00000048 B801000000 mov eax,0x1
0000004D 4883C408 add rsp,byte +0x8
00000051 5B pop rbx
00000052 4889EC mov rsp,rbp
00000055 5D pop rbp
00000056 C3 ret
hmmmm...that's the way it's supposed to work. By keeping the stack aligned to a 16-byte boundary in the prologue any future INVOKE calls within the procedure can happen without penalty. During the epilogue it correctly adjusts the stack offset by first adding 0x8 ( ie: we allocated an additional 8 bytes in the prologue due to only storing one register ) prior to popping off the register(s) saved.
In addition to investigating the original issue this also gives me an idea that the logic can be optimized further when saving registers if we encounter a LOCALS NONE statement. We can eliminate the "add rsp, 8" statement in the epilogue if we pre-align saved registers on the alignment boundary and just start popping. So maybe I can kill two birds with one stone here! 8)
Path: "nasmx-1.3\demos\win32\DEMO17"
DEMO17 - Bitmaps and Timer example
Demo17 source code claims, that,
it want's to be Demo16,
but Demo16 already exists,
partial data section content of demo17.asm:
...
[section .data]
szTitle: declare(NASMX_TCHAR) NASMX_TEXT("Demo 16 - WinFloor"), 0x0
...
* Single byte change in window title is required.
Bye!
Well, at least that one is easy to fix! :D
-
OK, made a bug fix to the USES macro to prevent the issue.
If you want to test it you can grab an entire snapshot (http://sourceforge.net/p/nasmx/code/HEAD/tree/trunk/) or just download nasmx.inc (http://sourceforge.net/p/nasmx/code/HEAD/tree/trunk/inc/nasmx.inc)
-
Did some testing with new nasmx.inc (117,979 bytes).
* Above problems seems to be fixed.
But there is something else.
1. Trying to use more registers
Code compiled with: "nasm.exe -f win64 test.asm -o test.obj"
bits 64
%include "nasmx.inc"
proc testProc,dword x,dword y
uses r10,r11,r12,r13,r14,r15,rdi,rsi,rbx
locals none
xor rax,rax
endproc
Results in:
test.asm:5: error: parser: instruction expected
2. Macro "invoke" saves arguments in stack, before call is made
I think, this has to be done in procedure context, rather than before call is made with invoke macro.
What if somebody call wsprintfA, this procedure doesn't care,
if somebody saved those arguments into shadow space before.
The one who designed "wsprintfA" ,that is his responsibility to save registers if he needs to do that.
This has to be procedure issue rather than invoke macro issue.
Yes, somebody could use FASTCALL_STACK_PRELOAD, but this is not the case.
Of course, backward compatibility does not permits to change something.
.- But what if somebody build obj library and call it from other languages, like C.
In C, the somebody will not have access to invoke macro and it will not save somebody's arguments.
.- What if somebody need to switch between somebody's custom procedures and some system procedures, like wsprintfA.
Switching between each call could be full of pain.
What if, what if, what if... endless loop of "what if's".
Who is this "somebody", well, no one know's, some mystical person. :D
3. 2.21: NASMX_PRAGMA: Defining pragmas
Was reading NASMX fables, from book "NASM-X.CHM" and found one thing, something...
So i made up a Puzzle Game:
Here is one letter too much, which one? : "FASTCALL_STACK_PRELOAD, [ ENABLED | DISABLE ]"
Yes, you think left right,
this is how it should be:
".... ENABLE ....".
* One byte removal is required.
And that's it, for now!
-
uses r10,r11,r12,r13,r14,r15,rdi,rsi,rbx
test.asm:5: error: parser: instruction expected
The USES macro definition currently only accepts up to 8 parameters. I guess we should expand this to a larger number that accounts for all general regs. Of course, at that point, we may want to use the pushad/popad opcodes for 32-bit apps ( 64-bit apps don't have that luxury ) to get a shorter/faster program.
Switching between each call could be full of pain.
I'm not sure what point you're trying to make here. For Win64 simply NASMX_PRAGMA FASTCALL_STACK_PRELOAD, DISABLE and assume nothing about the shadow space other than it exists for you. If you mean something else perhaps create another thread to continue discussing there?
one? : "FASTCALL_STACK_PRELOAD, [ ENABLED | DISABLE ]"
Good eye! Will fix!
-
Okay, here was one more:
Compile this code with: "nasm -f win64 test.asm -o test.obj"
bits 64
%include "nasmx.inc"
NASMX_PRAGMA FASTCALL_STACK_PRELOAD, DISABLE
IMPORT myprocedure
proc testProc,dword x,dword y
uses rbx
locals none
invoke myprocedure,0xFEED,0xA,0xBEEF
xor rax,rax
endproc
NDisasm with: "ndisasm -b 64 test.obj"
Notify: Twice the sub instruction.
0000003C 55 push rbp
0000003D 4889E5 mov rbp,rsp
00000040 48895DF0 mov [rbp-0x10],rbx
00000044 4883EC10 sub rsp,byte +0x10
00000048 4883EC20 sub rsp,byte +0x20
0000004C B9EDFE0000 mov ecx,0xfeed
00000051 BA0A000000 mov edx,0xa
00000056 41B8EFBE0000 mov r8d,0xbeef
0000005C E800000000 call qword 0x61
00000061 4883C420 add rsp,byte +0x20
00000065 4831C0 xor rax,rax
00000068 5B pop rbx
00000069 4889EC mov rsp,rbp
0000006C 5D pop rbp
0000006D C3 ret
AND: Is that normal, that invoke macro loads constant parameters into 32 bit registers?
Bye.
-
Notify: Twice the sub instruction.
0000003C 55 push rbp
0000003D 4889E5 mov rbp,rsp
00000040 48895DF0 mov [rbp-0x10],rbx
00000044 4883EC10 sub rsp,byte +0x10 ; <-- allocates USES storage space , created by USES macro
00000048 4883EC20 sub rsp,byte +0x20 ; <-- alocates parameter space , created by INVOKE macro
0000004C B9EDFE0000 mov ecx,0xfeed
00000051 BA0A000000 mov edx,0xa
00000056 41B8EFBE0000 mov r8d,0xbeef
0000005C E800000000 call qword 0x61
00000061 4883C420 add rsp,byte +0x20 ; <-- restore parameter space , performed by INVOKE macro
00000065 4831C0 xor rax,rax
00000068 5B pop rbx
00000069 4889EC mov rsp,rbp
0000006C 5D pop rbp
0000006D C3 ret
INVOKE cannot know whether you have actual code between it and the prior USES/LOCALS macros.
You may wish to use NASMX_PRAGMA CALLSTACK, 32 prior to your proc definition to see how your stack pointer is better optimized.
AND: Is that normal, that invoke macro loads constant parameters into 32 bit registers?
Yes, if it is a value that can fit in 32-bits - will use the shorter opcodes.
-
Notify: Twice the sub instruction.
0000003C 55 push rbp
0000003D 4889E5 mov rbp,rsp
00000040 48895DF0 mov [rbp-0x10],rbx
00000044 4883EC10 sub rsp,byte +0x10 ; <-- allocates USES storage space , created by USES macro
00000048 4883EC20 sub rsp,byte +0x20 ; <-- alocates parameter space , created by INVOKE macro
0000004C B9EDFE0000 mov ecx,0xfeed
00000051 BA0A000000 mov edx,0xa
00000056 41B8EFBE0000 mov r8d,0xbeef
0000005C E800000000 call qword 0x61
00000061 4883C420 add rsp,byte +0x20 ; <-- restore parameter space , performed by INVOKE macro
00000065 4831C0 xor rax,rax
00000068 5B pop rbx
00000069 4889EC mov rsp,rbp
0000006C 5D pop rbp
0000006D C3 ret
INVOKE cannot know whether you have actual code between it and the prior USES/LOCALS macros.
You may wish to use NASMX_PRAGMA CALLSTACK, 32 prior to your proc definition to see how your stack pointer is better optimized.
AND: Is that normal, that invoke macro loads constant parameters into 32 bit registers?
Yes, if it is a value that can fit in 32-bits - will use the shorter opcodes.
Okay, thanks, it's clear, for now. :)