NASM - The Netwide Assembler
NASM Forum => Using NASM => Topic started by: soren on August 19, 2011, 06:27:32 AM
-
is this possible? i was reading through the docs and couldnt find a particular way to do this. im able to disassemble a exe but the output seems very different to what nasm looks like. the output was really huge even for a hello world c program i compiled myself.
-
Well... the output of Ndisasm is intended more for "examination" than "reassembly". It is possible, with some minor tweaking (delete the first 20 bytes of each line... and other unneeded stuff... and add some needed stuff), to get it to reassemble. But any attempt to modify it would most likely fail. With a whole lot more work, it is possible to get something that can be modified, reassembled, and still work. (reassembling the unmodified disassembly is pointless - you've got the .exe!) With a very small executable, this might be practical, but as you observe, even a simple executable bloats up fast. For practical purposes, "no, you can't do that". Easier to write the code that does what the program needs to do from scratch (consulting the disassembly if you have to).
Ndisasm doesn't attempt to be "clever" - doesn't know about "executable formats"... headers and all. Just attempts to disassemble everything as if it were code. Agner Fog's "objconv" knows "disassembly" as one of its object formats, and knows executable header formats! You might find it interesting to disassemble your simple C hello world with it. Objconv knows what all the parts are, and it will help explain why it's so big!
http://www.agner.org
Edit: more complete URL:
http://www.agner.org/optimize/#objconv
Best,
Frank
-
Thanks for the thorough answer. Yeah, I'm aware of the "pointlessness" of disassembling a file and assembling it again. It was more of a curious question than anything else. I'm interested in how translators work and was wondering how much asm source could be recovered.
-
Some years ago I played around with a disassembler called "Borg", or something similar to that. Probably the most interesting part of it was that it generated assembler source which was almost completely compatible with MASM. The only thing you really had to work to fix in the disassembled code was the data types (the disassembler would make any section not the code section into a massive collection of DB statements which, for readability purposes had to be hand fixed.)
-
good fun.
i emaciated the hello world program down to the smallest thing that would assemble and got it to this:
global _main
section .text
_main:
then i assembled it and played around with disassembling it.
i had no idea how much "stuff" the linker puts into the exe file. for example, this is the output of from objconv of the .obj file for above code:
; Disassembly of file: nasm_bones.obj
; Mon Aug 22 23:28:05 2011
; Mode: 32 bits
; Syntax: YASM/NASM
; Instruction set: 80386
global _main
.absolut equ 00000000H ; 0
@feat.00 equ 00000001H ; 1
SECTION .text align=16 execute ; section number 1, code
this is the output of it after being put through the linker:
; Disassembly of file: nasm_bones.exe
; Mon Aug 22 23:39:46 2011
; Mode: 32 bits
; Syntax: YASM/NASM
; Instruction set: SSE2
global Entry_point: function
extern GetCommandLineA ; near; KERNEL32.dll
extern HeapSetInformation ; near; KERNEL32.dll
extern SetUnhandledExceptionFilter ; near; KERNEL32.dll
extern GetProcAddress ; near; KERNEL32.dll
extern GetModuleHandleW ; near; KERNEL32.dll
extern ExitProcess ; near; KERNEL32.dll
extern DecodePointer ; near; KERNEL32.dll
extern WriteFile ; near; KERNEL32.dll
extern GetStdHandle ; near; KERNEL32.dll
extern GetModuleFileNameW ; near; KERNEL32.dll
extern GetModuleFileNameA ; near; KERNEL32.dll
extern FreeEnvironmentStringsW ; near; KERNEL32.dll
extern WideCharToMultiByte ; near; KERNEL32.dll
extern GetEnvironmentStringsW ; near; KERNEL32.dll
extern SetHandleCount ; near; KERNEL32.dll
extern InitializeCriticalSectionAndSpinCount ; near; KERNEL32.dll
extern GetFileType ; near; KERNEL32.dll
extern GetStartupInfoW ; near; KERNEL32.dll
extern DeleteCriticalSection ; near; KERNEL32.dll
extern EncodePointer ; near; KERNEL32.dll
extern TlsAlloc ; near; KERNEL32.dll
extern TlsGetValue ; near; KERNEL32.dll
extern TlsSetValue ; near; KERNEL32.dll
extern TlsFree ; near; KERNEL32.dll
extern InterlockedIncrement ; near; KERNEL32.dll
extern SetLastError ; near; KERNEL32.dll
extern GetCurrentThreadId ; near; KERNEL32.dll
extern GetLastError ; near; KERNEL32.dll
extern InterlockedDecrement ; near; KERNEL32.dll
extern HeapCreate ; near; KERNEL32.dll
extern QueryPerformanceCounter ; near; KERNEL32.dll
extern GetTickCount ; near; KERNEL32.dll
extern GetCurrentProcessId ; near; KERNEL32.dll
extern GetSystemTimeAsFileTime ; near; KERNEL32.dll
extern LeaveCriticalSection ; near; KERNEL32.dll
extern EnterCriticalSection ; near; KERNEL32.dll
extern LoadLibraryW ; near; KERNEL32.dll
extern UnhandledExceptionFilter ; near; KERNEL32.dll
extern IsDebuggerPresent ; near; KERNEL32.dll
extern TerminateProcess ; near; KERNEL32.dll
extern GetCurrentProcess ; near; KERNEL32.dll
extern HeapFree ; near; KERNEL32.dll
extern Sleep ; near; KERNEL32.dll
extern GetCPInfo ; near; KERNEL32.dll
extern GetACP ; near; KERNEL32.dll
extern GetOEMCP ; near; KERNEL32.dll
extern IsValidCodePage ; near; KERNEL32.dll
extern RtlUnwind ; near; KERNEL32.dll
extern HeapSize ; near; KERNEL32.dll
extern HeapAlloc ; near; KERNEL32.dll
extern HeapReAlloc ; near; KERNEL32.dll
extern LCMapStringW ; near; KERNEL32.dll
extern MultiByteToWideChar ; near; KERNEL32.dll
extern GetStringTypeW ; near; KERNEL32.dll
extern IsProcessorFeaturePresent ; near; KERNEL32.dll
SECTION .text align=1 execute ; section number 1, code
?_0001: ; Local function
; Filling space: 2H
; Filler type: mov with same source and destination
; db 8BH, 0FFH
ALIGN 2
push ebp ; 00401002 _ 55
mov ebp, esp ; 00401003 _ 8B. EC
cmp dword [?_0982], 2 ; 00401005 _ 83. 3D, 00408B28(d), 02
jz ?_0002 ; 0040100C _ 74, 05
call ?_0074 ; 0040100E _ E8, 00000691
?_0002: push dword [ebp+8H] ; 00401013 _ FF. 75, 08
call ?_0063 ; 00401016 _ E8, 000004DA
push 255 ; 0040101B _ 68, 000000FF
call ?_0025 ; 00401020 _ E8, 000001EA
pop ecx ; 00401025 _ 59
pop ecx ; 00401026 _ 59
pop ebp ; 00401027 _ 5D
ret ; 00401028 _ C3
?_0003: ; Local function
push 20 ; 00401029 _ 6A, 14
push ?_0922 ; 0040102B _ 68, 00407860(d)
call ?_0235 ; 00401030 _ E8, 000012EB
xor esi, esi ; 00401035 _ 33. F6
cmp dword [?_1059], esi ; 00401037 _ 39. 35, 004098BC(d)
jnz ?_0004 ; 0040103D _ 75, 0B
push esi ; 0040103F _ 56
push esi ; 00401040 _ 56
push 1 ; 00401041 _ 6A, 01
push esi ; 00401043 _ 56
call near [imp_HeapSetInformation] ; 00401044 _ FF. 15, 00406004(d)
?_0004: mov eax, 23117 ; 0040104A _ B8, 00005A4D
cmp word [Unnamed_80000000_0], ax ; 0040104F _ 66: 39. 05, 00400000(d)
jz ?_0006 ; 00401056 _ 74, 05
?_0005: mov dword [ebp-1CH], esi ; 00401058 _ 89. 75, E4
jmp ?_0007 ; 0040105B _ EB, 36
?_0006: mov eax, dword [Unnamed_80000000_0] ; 0040105D _ A1, 0040003C(d)
cmp dword [Unnamed_80000000_0+eax], 17744 ; 00401062 _ 81. B8, 00400000(d), 00004550
jnz ?_0005 ; 0040106C _ 75, EA
mov ecx, 267 ; 0040106E _ B9, 0000010B
cmp word [Unnamed_80000000_0+eax], cx ; 00401073 _ 66: 39. 88, 00400018(d)
jnz ?_0005 ; 0040107A _ 75, DC
cmp dword [Unnamed_80000000_0+eax], 14 ; 0040107C _ 83. B8, 00400074(d), 0E
jbe ?_0005 ; 00401083 _ 76, D3
xor ecx,D, FC
pop edi ; 00401698 _ 5F
pop esi ; 00401699 _ 5E
xor ecx, ebp ; 0040169A _ 33. CD
pop ebx ; 0040169C _ 5B
call ?_0416 ; 0040169D _ E8, 00001BF7
that is just a little bit of it! in all it goes 9000 lines! :o
-
part of output from borg:
;
; Created by Borg Disassembler
; written by Cronos
1000:00401000 ;-----------------------------------------------------------------------
1000:00401000 ;Segment : 1000h Offset : 401000h Size : 4400h
1000:00401000 ;32-bit Code
1000:00401000 ;-----------------------------------------------------------------------
1000:00401000 ; XREFS First: 1000:0040109e Number : 3
1000:00401000 loc_00401000:
1000:00401000 8bff mov edi, edi
1000:00401002 55 push ebp
1000:00401003 8bec mov ebp, esp
1000:00401005 833d288b400002 cmp dword ptr [loc_00408b28], 02h
1000:0040100c 7405 jz loc_00401013
1000:0040100e e891060000 call loc_004016a4
1000:00401013 ; XREFS First: 1000:0040100c Number : 1
1000:00401013 loc_00401013:
1000:00401013 ff7508 push dword ptr [ebp+08h]
1000:00401016 e8da040000 call loc_004014f5
1000:0040101b 68ff000000 push 0ffh
1000:00401020 e8ea010000 call loc_0040120f
1000:00401025 59 pop ecx
1000:00401026 59 pop ecx
1000:00401027 5d pop ebp
1000:00401028 c3 ret
1000:00401029 ; XREFS First: 1000:0040118f Number : 1
1000:00401029 loc_00401029:
1000:00401029 6a14 push 14h
1000:0040102b 6860784000 push offset loc_00407860
1000:00401030 e8eb120000 call loc_00402320
1000:00401035 33f6 xor esi, esi
1000:00401037 3935bc984000 cmp dword ptr [loc_004098bc], esi
1000:0040103d 750b jnz loc_0040104a
1000:0040103f 56 push esi
1000:00401040 56 push esi
1000:00401041 6a01 push 01h
1000:00401043 56 push esi
1000:00401044 ff1504604000 call dword ptr [HeapSetInformation]
1000:0040104a ; XREFS First: 1000:0040103d Number : 1
1000:0040104a loc_0040104a:
1000:0040104a b84d5a0000 mov eax, 5a4dh
1000:0040104f 66390500004000 cmp word ptr [400000h], ax
1000:00401056 7405 jz loc_0040105d
1000:00401058 ; XREFS First: 1000:0040106c Number : 3
1000:00401058 loc_00401058:
1000:00401058 8975e4 mov [ebp-1ch], esi
1000:0040105b eb36 jmp loc_00401093
1000:0040105d ; XREFS First: 1000:00401056 Number : 1
1000:0040105d loc_0040105d:
1000:0040105d a13c004000 mov eax, dword ptr [40003ch]
1000:00401062 81b80000400050450000 cmp dword ptr [eax+400000h], 4550h
1000:0040106c 75ea jnz loc_00401058
1000:0040106e b90b010000 mov ecx, 10bh
1000:00401073 66398818004000 cmp [eax+400018h], cx
1000:0040107a 75dc jnz loc_00401058
1000:0040107c 83b8740040000e cmp dword ptr [eax+400074h], 0eh
1000:00401083 76d3 jbe loc_00401058
1000:00401085 33c9 xor ecx, ecx
1000:00401087 39b0e8004000 cmp [eax+4000e8h], esi
1000:0040108d 0f95c1 setnz cl
1000:00401090 894de4 mov [ebp-1ch], ecx
1000:00401093 ; XREFS First: 1000:0040105b Number : 1
1000:00401093 loc_00401093:
1000:00401093 e85c120000 call loc_004022f4
1000:00401098 85c0 test eax, eax
1000:0040109a 7508 jnz loc_004010a4
1000:0040109c 6a1c push 1ch
1000:0040109e e85dffffff call loc_00401000
1000:004010a3 59 pop ecx
1000:004010a4 ; XREFS First: 1000:0040109a Number : 1
1000:004010a4 loc_004010a4:
1000:004010a4 e8d0100000 call loc_00402179
1000:004010a9 85c0 test eax, eax
1000:004010ab 7508 jnz loc_004010b5
1000:004010ad 6a10 push 10h
1000:004010af e84cffffff call loc_00401000
1000:004010b4 59 pop ecx
1000:004010b5 ; XREFS First: 1000:004010ab Number : 1
1000:004010b5 loc_004010b5:
1000:004010b5 e87a0d0000 call loc_00401e34
1000:004010ba 8975fc mov [ebp-04h], esi
1000:004010bd e82d0b0000 call loc_00401bef
1000:004010c2 85c0 test eax, eax
1000:004010c4 7908 jns loc_004010ce
-
Yep, you'll find that when dealing with C, there is a lot of behind the scenes stuff that needs to go on to get things setup the way the language expects it to be. These "stubs" are one of the main reasons I like writing in assembly rather than C. Your program doesn't actually start at _main, that is a function called by libc's startup files once they have configured the C environment. The true entry point is at _start and you can actually create your own to reduce code size. Check this out on a GNU/Linux system with NASM and C.
First, create the following two files (hello.c and loader.asm):
#include <stdio.h>
int main(int argc, char *argv[], char *envp[])
{
printf( "Hello, World!\n" );
return ( 0 );
}
BITS 32
SECTION .text
EXTERN main
GLOBAL _start
_start:
pop ecx
mov esi, esp
push ecx
lea eax, [(esi+4)+(ecx*4)]
push eax
push esi
push ecx
call main
add esp, 12
xor ebx, ebx
xor eax, ebx
xor ebx, eax
xor eax, ebx
inc eax
int 0x80
Now compile and run the C version without the loader:
bash-4.1$ gcc -o hello1 hello.c
bash-4.1$ ./hello1
Hello, World!
bash-4.1$
Now assemble the loader and compile it against the C version (and remove the startup code from GCC):
bash-4.1$ nasm -f elf -o loader.o loader.asm
bash-4.1$ gcc -nostartfiles -o hello2 hello.c loader.o
bash-4.1$ ./hello2
Hello, World!
bash-4.1$
Great! So both versions seem to work the same. Lets take a look at the listing.
bash-4.1$ ls -o1
total 24
-rw-r--r-- 1 bkeller 116 2011-08-24 03:08 hello.c
-rwxr-xr-x 1 bkeller 5525 2011-08-24 03:15 hello1
-rwxr-xr-x 1 bkeller 2019 2011-08-24 03:19 hello2
-rw-r--r-- 1 bkeller 248 2011-08-24 03:12 loader.asm
-rw-r--r-- 1 bkeller 512 2011-08-24 03:19 loader.o
bash-4.1$
Wow! look at that, we actually shaved off 3506 bytes from the C source by just using a custom startup routine. ;) Being as you're on windows, the code will be a bit different. But the idea is the same. Basically libc needs to grab argc, argv, and sometimes envp (not standard) and invoke the main() C function, afterwards it needs to invoke the system shutdown routine. One thing you'll notice, win32's command line argument function (GetCommandLine) returns the command line as a single string, so your programs have to include a routine to parse that command line into an array of character pointers for main before it can start. This adds even more overhead to the final exectuable (in our Linux environment, the argc,argv,envp are available directly on the stack and only need to be located).
-
Hi,
I have a question which fits well to this thread.
I am following the instructions provided in this online resource here:
http://www.phreedom.org/solar/code/tinype/
It talks about minimizing the PE file size by omitting unused header fields from MZ and PE Header.
However, I am stuck at this section: "Switching to assembly and removing the DOS stub".
How have they disassembled the executable C program into an assembly source mentioned there (tiny.asm)?
I have tried using:
ndisasm -b 32 tiny.exe, but it gives a different format.
Also, I used the tiny.asm file provided there and saved it.
Ran the command:
nasm -f bin -o tiny.exe tiny.asm
It throws the following error messages:
tiny.asm:71: warning: macro `sectalign' exists, but not taking 0 parameters
tiny.asm:69: error: symbol `sectalign' undefined
tiny.asm:71: error: symbol `sectalign' undefined
tiny.asm:80: error: symbol `sectalign' undefined
tiny.asm:106: error: symbol `sectalign' undefined
It would be great if you could get me past this step and help me in understanding the concept of disassembling better.
Regards,
NeonFlash
-
We don't do malware here, and the site you link to veers off in that direction. Your question does not involve the malware part, so you're okay. Do not discuss malware here - the moderation team is not in a tolerant mood!
The disassembly of the executable from "tiny.c" is most easily done using Agner Fog's "objconv". To do it in ndisasm:
ndisasm -b32 -e1D0h tiny.exe
"tiny.asm" was apparently written before "sectalign" was added to Nasm as a built-in macro.
http://www.nasm.us/xdoc/2.09.10/html/nasmdoc4.html#section-4.11.13
It uses "sectalign" as:
sectalign equ 4
This apparently confuses Nasm. Call it something else (I used "salign"), and I think you'll find that it assembles without complaint. To disassemble this with ndisasm (no point to it):
ndisasm -b32 -e0Ch tiny.exe
Further options to ndisasm would prevent it from disassembling the cruft after the instructions. Because of the non-standard header (I assume), "objconv" segfaults(!) on this puppy.
Best,
Frank
-
Thanks Frank for your help.
I apologize for that but my intention was not to post something related to malware here. I found that exercise interesting and it helped me learn more about the PE file format and what are the different header fields involved in it.
I was able to assemble the file, tiny.asm using NASM by replacing sectalign with salign. Thanks :)
I am still confused about the assembly source mentioned there.
When I disassemble the C program executable using ndisasm, it gives the output as shown below:
00000000 0000 add [eax],al
00000002 0000 add [eax],al
00000004 0000 add [eax],al
00000006 0000 add [eax],al
00000008 0000 add [eax],al
0000000A 0000 add [eax],al
0000000C 0000 add [eax],al
0000000E 0000 add [eax],al
00000010 0000 add [eax],al
00000012 0000 add [eax],al
00000014 0000 add [eax],al
00000016 0000 add [eax],al
00000018 0000 add [eax],al
0000001A 0000 add [eax],al
0000001C 0000 add [eax],al
0000001E 0000 add [eax],al
00000020 0000 add [eax],al
00000022 0000 add [eax],al
It's giving me the Memory Address Offset / Opcodes / Assembly Language Instruction format.
However, I am looking for a way to convert it to the following format:
; tiny.asm
BITS 32
;
; MZ header
;
; The only two fields that matter are e_magic and e_lfanew
mzhdr:
dw "MZ" ; e_magic
dw 0 ; e_cblp UNUSED
dw 0 ; e_cp UNUSED
dw 0 ; e_crlc UNUSED
dw 0 ; e_cparhdr UNUSED
dw 0 ; e_minalloc UNUSED
dw 0 ; e_maxalloc UNUSED
dw 0 ; e_ss UNUSED
dw 0 ; e_sp UNUSED
dw 0 ; e_csum UNUSED
dw 0 ; e_ip UNUSED
dw 0 ; e_cs UNUSED
dw 0 ; e_lsarlc UNUSED
dw 0 ; e_ovno UNUSED
times 4 dw 0 ; e_res UNUSED
dw 0 ; e_oemid UNUSED
dw 0 ; e_oeminfo UNUSED
times 10 dw 0 ; e_res2 UNUSED
dd pesig ; e_lfanew
;
; PE signature
;
pesig:
dd "PE"
;
; PE header
;
pehdr:
dw 0x014C ; Machine (Intel 386)
dw 1 ; NumberOfSections
dd 0x4545BE5D ; TimeDateStamp UNUSED
dd 0 ; PointerToSymbolTable UNUSED
dd 0 ; NumberOfSymbols UNUSED
dw opthdrsize ; SizeOfOptionalHeader
dw 0x103 ; Characteristics (no relocations, executable, 32 bit)
;
; PE optional header
;
filealign equ 1
sectalign equ 1
%define round(n, r) (((n+(r-1))/r)*r)
opthdr:
dw 0x10B ; Magic (PE32)
db 8 ; MajorLinkerVersion UNUSED
db 0 ; MinorLinkerVersion UNUSED
dd round(codesize, filealign) ; SizeOfCode UNUSED
dd 0 ; SizeOfInitializedData UNUSED
dd 0 ; SizeOfUninitializedData UNUSED
dd start ; AddressOfEntryPoint
dd code ; BaseOfCode UNUSED
dd round(filesize, sectalign) ; BaseOfData UNUSED
dd 0x400000 ; ImageBase
dd sectalign ; SectionAlignment
dd filealign ; FileAlignment
dw 4 ; MajorOperatingSystemVersion UNUSED
dw 0 ; MinorOperatingSystemVersion UNUSED
dw 0 ; MajorImageVersion UNUSED
dw 0 ; MinorImageVersion UNUSED
dw 4 ; MajorSubsystemVersion
dw 0 ; MinorSubsystemVersion UNUSED
dd 0 ; Win32VersionValue UNUSED
dd round(filesize, sectalign) ; SizeOfImage
dd round(hdrsize, filealign) ; SizeOfHeaders
dd 0 ; CheckSum UNUSED
dw 2 ; Subsystem (Win32 GUI)
dw 0x400 ; DllCharacteristics UNUSED
dd 0x100000 ; SizeOfStackReserve UNUSED
dd 0x1000 ; SizeOfStackCommit
dd 0x100000 ; SizeOfHeapReserve
dd 0x1000 ; SizeOfHeapCommit UNUSED
dd 0 ; LoaderFlags UNUSED
dd 16 ; NumberOfRvaAndSizes UNUSED
I am not sure what this format is called specifically. I assume this is the assembly source. Is this written manually or is there a way to get this type of format by disassembling?
Thanks for your patience in taking the time to help me out :)
Regards,
NeonFlash
-
No apology necessary. There's no actual malicious code there. Unfortunately(?) they chose "download and execute" as the "challenge"... and they use a "trick" to do it, so there's really no "code" involved...
What you show above is plain assembly source, and it's "hand-written" - there's no way a disassembler could know that some fields are supposed to be "dw" and others "dd"... let alone "rounded"... In the executable, it's just a big clump of bytes. A lot of information is lost in going from source (any language) to an executable. Agner Fog's "objconv" knows about ELF and PE headers 'cause Agner told it - the information is not actually in the executable. Also, "objconv" puts the address and opcode bytes on the right, as a comment, and the disassembled instructions on the left - so the thing can actually be assembled. Neat tool!
That "Tiny PE" page says:
We'll disassemble our 468 byte C program and convert it to assembly source that can be assembled with NASM.
They don't mention that "convert" involves rewriting it "by hand". For practical purposes, the answer remains: "no, you can't do that".
Best,
Frank