Author Topic: Relative addressing confusion (Read 12618 times)

kcghost · « **on:** November 16, 2013, 03:31:21 PM »

I am currently taking flat binary out of NASM, stuffing it into a C char array, and executing it as a function. I like tricky crap. Its fun.

Its easy to do this in C on Linux, you just have mark that memory as executable.
Here is the C code I am using (got a start from this page: http://www.daniweb.com/software-development/c/threads/353077/store-binary-code-in-memory-then-execute-it):
main.c

Code: [Select]

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

#include "asm.inc"

int main(int argc, char**argv)
{
    void *addr = (void*)((unsigned long)hello & ((0UL - 1UL) ^ 0xfff));/*get memory page*/
    int ans = mprotect(addr, 1, PROT_READ|PROT_WRITE|PROT_EXEC);/*set page attributes*/

    if (ans)
    {
        perror("mprotect");
        exit(EXIT_FAILURE);
    }

    printf("before\n");
    ((void(*)(void))hello)();
    printf("after\n");

    return 0;
}

Notice the "#include "asm.inc"", I run 'xxd -i' on the nasm binary to turn it into a C style char array, and just #include it.
Here are the commands I use in Makefile form:

Code: [Select]

test:
	nasm -f bin -O0 hello.asm; \
	xxd -i hello > asm.inc; \
	gcc main.c; \
	./a.out; \
	echo "hello"; \
	ndisasm -b 64 hello

Here is some assembler code for 64 bit Linux (a hello world, got a start here: http://blog.markloiseau.com/2012/05/64-bit-hello-world-in-linux-assembly-nasm/):
hello.asm

Code: [Select]

BITS 64

SECTION .data
msg:	db "kaboom",0x0A,0x0D
len:	equ $-msg

SECTION .text
;mov    rsi, [rel msg]    ; message address
call dword 0x5
pop rsi
add rsi,0x2A

mov    rax, 1        ; sys_write                   
mov    rdi, 1        ; stdout
mov    rdx, len    ; message string length
syscall

ret

Notice the 'ret' at the end so that it adequately enough represents a function that returns.

This code, as is, works. On a 64 bit Linux machine, it will execute and print out "before\nkaboom\nafter".
The problem is I am not sure I know exactly why it works, or why I can't do it differently. It gets tricky with the message address that needs to end up in RSI.

The code as is is based on the disassembly of the char array of the original C code I found at http://www.daniweb.com/software-development/c/threads/353077/store-binary-code-in-memory-then-execute-it. I experimented with the value added to RSI to get it working.

It uses 'call' and 'pop' to simply move to the next line and get RIP into RSI. The 'pop' line is located in the disassembly at 0x5, so I assume the RIP it got was essentially 0x4 relative to the beginning of this code. The message is located in the disassembly at 0x2E. 0x2E - 0x4 = 0x2A. I am guessing the value needed to be 0x2A because that is the difference between the 'call' line and the message.

But notice the commented out 'mov rsi, [rel msg]'. If you uncomment that, and comment out the 'call','pop' and 'add', it does not work.
But why doesn't it? As far as I understand, it should populate RSI with the address of msg relative to RIP, just as the previous code was essentially doing.

So why is

Code: [Select]

mov    rsi, [rel msg]

not equivalent to

Code: [Select]

call dword 0x5
pop rsi
add rsi,0x2A

?

Thanks for any help.

Frank Kotler · « **Reply #1 on:** November 17, 2013, 11:14:25 AM »

Code: [Select]

mov    rsi, [rel msg]

Why the square brackets around "[rel msg]"?

Best,
Frank

kcghost · « **Reply #2 on:** November 17, 2013, 04:00:56 PM »

I found this documentation regarding effective addresses: http://www.nasm.us/doc/nasmdoc3.html#section-3.3

Basically you can enclose expressions in square brackets to produce an effective address, allowing to perform algebra on addresses easily.
It also mentions that by default all addresses are absolute, but this can be changed either with the REL keyword changing the default with 'DEFAULT'.

But I found out why I was confused:
I accidentally took 'effective address' to mean 'value of address'.

Code: [Select]

mov rsi,[msg] tries to move the memory at address 'msg' to rsi. That produces a segfault, because the address of msg is 0x2B in this case.

Code: [Select]

mov rsi,[rel msg] moved the memory at msg relative to RPI into RSI. So 'kaboom' itself was in RSI, not the address of 'kaboom'. Somehow, 'kaboom' interpreted as a pointer did not segfault, but it is still wrong.

Code: [Select]

mov rsi,msg moves the actual value of msg (0x2B) into rsi, which does not produce a segfault, but is still wrong as that is not a valid address of the string.

What I need is the value of the effective address [rel msg] to end up in rsi.
As far as I can tell, the only way to get this is to calculate it in code with 'call','pop' and 'add'.
Which is fine, but I am curious, is there a more efficient way?

encryptor256 · « **Reply #3 on:** November 17, 2013, 04:09:20 PM »

Hi!

x32, Print message like this:

Code: [Select]

mov	edx,4		; message length
mov	ecx,msg     ; message to write
mov	ebx,1		; file descriptor (stdout)
mov	eax,4		; system call number (sys_write)
int	0x80		; call kernel

Code, from: Linux System Calls

x64, Print message like this:

Code: [Select]

mov rdi,qword 0x01
mov rsi,qword messageAddress
mov rdx,qword messageLen
mov rax,qword 0x01
syscall

Code, here: 1.1 Linux System Calls

Edit:

Quote

What I need is the value of the effective address [rel msg] to end up in rsi.

Code: [Select]

mov rsi,msg

Should be fine.

Edit:

You are missing entry point label and sys_exit.

Your link has a perfect example: 2. “Hello World” in 64-bit Linux assembly

Bye!

NASM - The Netwide Assembler

News:

Author Topic: Relative addressing confusion (Read 12618 times)

kcghost

Relative addressing confusion

Frank Kotler

Re: Relative addressing confusion

kcghost

Re: Relative addressing confusion

encryptor256

Re: Relative addressing confusion