Author Topic: how to create an array  (Read 21636 times)

Nairda Gnieob

  • Guest
how to create an array
« on: July 07, 2009, 01:13:51 PM »
Hi,

ASM newbie here, I want to know how to create an array on the stack.

Basically, in C:
main() {
 int x;
 char array[16];
 x = 3
 function_that_modifies_data(x,array);
}

how would I do that in NASM?

mov eax,3
push eax
push ??? ; what do i push? how do i declare it?
call function_that_modifies_data

Thanks!

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: how to create an array
« Reply #1 on: July 08, 2009, 03:58:02 AM »
Well, you've got "array" as a "local" variable - on the stack, as you intend - so it won't have an address that we can tell Nasm at assemble time. We'll have to calculate it at run time with "lea" (load effective address), and push that. (I'm assuming that "array" is equivalent to "&array[0]" in C - makes no sense to pass the array itself as a parameter)

This is *not* the output you'd get from a C compiler, but is intended to be a "simplified" version of what C would do. (a decent compiler would produce faster code, but harder to follow)  I didn't know what to do for a function - not a lot you *can* do with an array of characters and an int except use the int as an index, so I just set "array[x]" to 1. A decent compiler would warn us that we're using "array" uninitialized. Okay, not intended to be useful. A useful function might need a third parameter - number of elements in the array.

; nasm -f elf array.asm
; gcc -o array array.o

global main

section .text
main:
    push ebp
    mov ebp, esp
    sub esp, 20  ; 16 characters and an int
    mov dword [ebp - 4], 3 ; x

lea eax, [ebp - 20]
    push eax
    push dword [ebp - 4]
    call function_that_modifies_data
    add esp, 4 * 2

mov esp, ebp
    pop ebp
    ret

function_that_modifies_data:
    push ebp
    mov ebp, esp

mov eax, [ebp + 8] ; x
    mov ecx, [ebp + 12] ; address of array

mov byte [ecx + eax], 1 ; modify data

mov esp, ebp
    pop ebp
    ret

Depending on how "new" you are, it may be easier to not have "array" on the stack, to begin with. This still uses it uninitialized - "section .bss" is uninitialized data. If we wanted it initialized, put it in "section .data" as "array db 1, 2, 3, 4, 5..." or whatever.

; nasm -f elf array2.asm
; ld -o array2 array2.o

global _start

section .data
    x dd 3

section .bss
    array resb 16

section .text
_start:
    push array ; address of array
    push dword [x]
    call function_that_modifies_data
    add esp, 4 * 2

mov eax, 1
    mov bl, 42
    int 80h

function_that_modifies_data:
    mov eax, [esp + 4] ; x
    mov ecx, [esp + 8] ; address of array

mov byte [ecx + eax], 1 ; modify data

ret

The first example shows how to use "lea" to get the address of a variable on the stack, which I guess is what you're looking for.

Best,
Frank

Nairda Gnieob

  • Guest
Re: how to create an array
« Reply #2 on: July 08, 2009, 07:54:15 AM »
Thanks a lot for your reply Frank, it was very helpful. I managed to get the version using the 'bss' segment going.

I am having some trouble getting the 'stack' version going. (I get a segfault). First, a little more on my problem: I am trying to write a boot loader, and I want to print some numbers to the screen, so I thought I would write an itoa routine in C and then link it in.. First step, try it in userspace.

There are a few things I don't understand:
-Where does esp come from? what does it really mean? I know it is a 'stack' pointer, but I presume the stack is set up by the OS? so without an OS does esp mean something at all? how does it 'grow'?
-Why do we need to push&pop ebp?
-What exactly does LEA do? How is this different from MOV?

Here is my code as it stands:

extern printf
extern n2s

SECTION .data
x dd 3
fmt db "%s", 10, 0 ; 10 is /n, 0 is string null

SECTION .text
global main
main:
   push ebp   ;save ebp, why?
   mov ebp,esp   ;where does esp come from?

sub esp,20   ;create some space. where does this space come from?
   mov dword[ebp -4], 3

lea eax,[ebp-20];how is this different from mov?

push eax
   push dword [ebp - 4]
   call n2s
   add esp, 4*2 ;once for function, once for return value?

push eax   ;is this right?
   push dword fmt
   call printf
   add esp, 12 ;i don't know why this is 12, i just copied it from someone else

mov esp,ebp ;again no idea why we do this and what this does
   pop ebp ; why do we restore this

mov eax,0 ;return 0
   ret

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: how to create an array
« Reply #3 on: July 09, 2009, 12:14:36 AM »
> Thanks a lot for your reply Frank, it was very helpful. I managed to get the
> version using the 'bss' segment going.

Good start.

> I am having some trouble getting the 'stack' version going. (I get a segfault).

:) I got a bunch of segfaults, before I figured out my stupid typo. I was trying to set the local variable to 3 with "mov byte [ebp + 4], 3". Local variables are at "[ebp - ???]", not '+'!!! I was clobbering main's return address. Sheesh!!!

> First, a little more on my problem: I am trying to write a boot loader,

I don't want to discourage you, but writing a boot loader isn't an easy thing to do. Worse, serious OS developers usually use GRUB (Grand Unified Bootloader), or similar, instead of writing their own. "GRUB does everything I was going to do when I got to it, only better", as one developer explained it. However, it is "educational", and "fun" to see the computer controlled entirely by your own code (and maybe a little BIOS). Maybe I'm just easily amused. Many people want to do it... (for rather small values of "many", I guess. :)

> and
> I want to print some numbers to the screen,

Useful.

> so I thought I would write an itoa
> routine in C and then link it in..

Link it to... your boot loader? Good luck.

> First step, try it in userspace.

Right. There, if it fails, we can run it in a debugger to see what's going wrong.

> There are a few things I don't understand:

Me too. More than a few! :)

> -Where does esp come from?

Intel. :)

> what does it really mean?

Extended Stack Pointer - esp is the "plain English" name, the "true name" is a 3-bit number - 100b for esp.

> I know it is a 'stack'
> pointer, but I presume the stack is set up by the OS?

Yeah... but not if you don't have an OS.

> so without an OS does
> esp mean something at all?

Yes. This is a "hardware thing". Besides being a "general purpose register", esp is an implied register for several instructions (as edx:eax are implied with mul). The most "obvious", I guess, are push and pop. But the most "important", IMO, are call and ret. "call" puts the return address - the address immediately after the call instruction and its operand - on the stack...  essentially "sub esp, 4", mov [esp], return_address"... and jumps to a new address. When "ret" is encountered, the action is essentially "pop eip" - the return address is removed from the stack, and we go there. If esp isn't pointing to the return address when we hit "ret", we go someplace else - usually without good results!

"The stack" is an area of ordinary memory, "special" only in that ss:esp point to it. We've always got a stack - it's wherever ss:esp point to. It's our responsibility (or the OS's) to make sure it's someplace "sane" - where our code/data won't clobber it, or be clobbered by it. If the OS puts it somewhere sane, we have to keep it there.

> how does it 'grow'?

Down. We start with esp (ss:esp) at the top of the memory area we've reserved for "the stack". Pushes and calls subtract something from esp, and we subtract something for local variables. When we "free" the local variables ("mov esp, ebp"), ret and pop, the stack grows back up.

> -Why do we need to push&pop ebp?

The "Intel ABI" requires it (Application Binary Interface). Sounds like a "hardware thing", but it's just a "convention". Registers ebx, esi, edi, and ebp are expected to be preserved across calls. There's a "hardware reason" for these registers - they're the ones usable for addressing in 16-bit mode. In 32-bit code, entering a function, we can:

push ebp
mov ebp, esp
mov eax, [ebp + 8] ; get first parameter

Or, we could skip that, and just:

mov eax, [esp + 4] ; get first parameter

In 16-bit code, which is what you'll be writing for your bootloader, [sp] is not a valid addressing mode, but [bp] is. So there's a reason we do it that way.

> -What exactly does LEA do?

Arithmetic. Load Effective Address calculates the second operand, which must have the form of a valid effective address ("[...]" required in Nasm syntax), and puts it in the first operand. No memory is touched. You can't do arbitrary arithmetic with it, but it doesn't actually have to be an "address". Here's the complement to your number2string routine:

atoi:
    mov edx, [esp + 4]  ; pointer to string
    xor eax, eax        ; clear "result"
.top:
    movzx ecx, byte [edx]
    inc edx
    cmp ecx, byte '0'
    jb .done
    cmp ecx, byte '9'
    ja .done

; we have a valid character - multiply
    ; result-so-far by 10, subtract '0'
    ; from the character to convert it to
    ; a number, and add it to result.

lea eax, [eax + eax * 4]
    lea eax, [eax * 2 + ecx - 48]

jmp short .top
.done
    ret

> How is this different from MOV?

Doesn't move anything... except the result of the calculation into the specified register. Consider:

mov eax, [ebp - 4] ; get contents of "x" - 3
lea eax, [ebp - 20] ; get address of "array"

There used to be an instruction reference in the Nasm manual, but it's gone (no one wanted to maintain it). I retained the old, unmaintained version:

http://home.myfairpoint.net/fbkotler/nasmdocr.html

There are better instruction set references available... Famously, the Nasm manual doesn't mention effects on flags, which is good to know. Intel and AMD have manuals, of course - at certain times, they'll offer to ship you a printed one, free!

> Here is my code as it stands:
>
> extern printf
> extern n2s
>
> SECTION .data
> x dd 3
> fmt db "%s", 10, 0 ; 10 is /n, 0 is string null

"\n", you mean. I make that mistake *all* the time! :)

Nasm will, in recent versions, accept "\n" and similar C-isms, if you enclose your string in "back quotes":

fmt db `%s\n`, 0

"10" seems easier to me. :)

> SECTION .text
> global main
> main:
>    push ebp   ;save ebp, why?

'Cause your program will crash if you don't. main's caller is using it.

>    mov ebp,esp   ;where does esp come from?

From main's caller - crt0.o or such. The "startup code" puts envp, argv, and argc on the stack, and calls main... so the return address is on the stack, too.

>    sub esp,20   ;create some space. where does this space come from?

From the memory area we, or the OS, has reserved as "stack". We give it back at the end.

>    mov dword[ebp -4], 3

This is where I screwed up. I wrote "[ebp + 4]"! That would be main's return address - not a good one to clobber!

>    lea eax,[ebp-20];how is this different from mov?
>
>    push eax
>    push dword [ebp - 4]
>    call n2s
>    add esp, 4*2 ;once for function, once for return value?

Once for each of the two parameters we pushed.

>    push eax   ;is this right?

Might be. We want the address of "array". We had that in eax before we called n2s. That "Intel ABI" expects the return value (not the return address) in eax (or edx:eax). I fixed *my* n2s so it returned the buffer it was given in eax. What's in eax when *your* n2s returns? Might *not* be right... (do the lea thing again, or something).

>    push dword fmt
>    call printf
>    add esp, 12 ;i don't know why this is 12, i just copied it from someone else

You must have copied it from someplace that passed three parameters. We only have two, so should be "8". This might cause a crash, BUT...

>    mov esp,ebp ;again no idea why we do this and what this does

Besides "freeing" our local variables "x" and "array" (this is why they call 'em "automatic" storage class), this covers our a** if we screwed up esp above - we don't really have to "clean up the stack" by removing the parameters we pushed after each function call, we can "defer" it. This puts esp back where it was right after we pushed ebp, so when we pop ebp and get to ret, main's return address will be the next thing on the stack. All of this assumes that we haven't trashed ebp!

>    pop ebp ; why do we restore this

Because main's caller is using it (or might be). And because we pushed it.

>    mov eax,0 ;return 0
>    ret

Here's my n2s, which returns "buffer" (array), and works with the above code (I changed "12" to "8", but I don't think it matters).

;---------------
global n2s

n2s:
    push ebp
    mov ebp, esp

push ebx ; ABI wants 'em preserved!
    push edi

mov eax, [ebp + 8] ; number to convert
    mov edi, [ebp + 12] ; buffer for result
    mov ebx, 10 ; decimal conversion
    xor ecx, ecx ; counter
.pushloop:
    xor edx, edx
    div ebx
    push edx
    inc ecx
    cmp eax, 0
    jnz .pushloop
.poploop:
    pop eax
    add al, '0'
    stosb
    loop .poploop
    mov byte [edi], 0

; since our caller expects this to return "buffer", do so.

mov eax, [ebp + 12]

pop edi
    pop ebx

mov esp, ebp
    pop ebp
    ret
;---------------

This isn't going to help you display numbers in your bootloader, being 32-bit code. The 16-bit version would look about the same... I think I'd cut 'n paste it, rather than try to link, but maybe I just don't know how to do it.

Best,
Frank