Author Topic: nasm general questions  (Read 18034 times)

Offline xdcx

  • Jr. Member
  • *
  • Posts: 12
nasm general questions
« on: August 14, 2013, 01:24:15 AM »
1. why do some nasm programs (online tutorials) have their main 'global _start', 'global main' only seems to work for me? With 'global _start' nasm returns errors

2.
If i were to sys_write (int 80h) with the variables aligned in this manner

Code: [Select]
section .data
msg1: db "message one",10,0
msg2: db "message two",10,0
len1: equ $-msg1
len2: equ $-msg2

it will incorrectly print 'msg1' with 'message one message two', because it is combining the first variable with the second one,
you correct it by moving the lens under their msgs, but why does it do this when len1, msg1 are move to separate registers for sys_write

Code: [Select]
    mov edx,    len1
    mov ecx,    msg1
    mov ebx,    1
    mov eax,    4
    int 80h

why does msg1 combine with msg2? do any of the other variables combine? This seemed surprising to me, like a gotcha
when calling printf you don't need to store length, but (i think) is automatically processed by printf

3. is there no difference between the two

Code: [Select]
mov BYTE [ecx], 0  ;1
mov [ecx], BYTE 0  ;2

Code: [Select]
mov BYTE [ecx], 0  ;1
mov BYTE PTR [ecx], 0   ;2

4. What is PTR used for in NASM when there are square brackets [ ] ?

5. are registers (EAX, EBX, etc) initially stored with 0? is it pointless to 'mov eax, 0' to set eax to 0 ?

6. 'extern printf' works for me but online tutorials show 'extern _printf' ,etc with a underscore, _printf wont work for me, why is it different for me?

7. what is the difference between the two:

Code: [Select]
mov eax, 1
mov ebx, 0
int 80h

Code: [Select]
ret
sometimes when I use `ret` I get a segmentation fault, but the first  i dont. what is the difference?

im on ubuntu 13.04 64bit
« Last Edit: August 14, 2013, 02:46:02 AM by xdcx »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: nasm general questions
« Reply #1 on: August 14, 2013, 01:37:11 PM »
1) I doubt if Nasm is issuing that error. Nasm should accept any symbol, provided it uses only valid characters (see the manual). Your linker, ld, "knows" _start as the default entrypoint. You can override this with the "-e" switch to ld...
Code: [Select]
global commence
section .text
commence:
; your code
Code: [Select]
nasm -f elf32 myfile.asm
ld -o myfile myfile.o -e commence
... should work fine. If ld does not find an entrypoint, either _start or something you specify, it will issue a warning and take a guess where the entrypoint is. This may not be correct! C programmers, for some reason, love to put their subroutines first, and main last. You can do the same...
Code: [Select]
global commence
section .text
my_subroutine:
    mov eax, ebx
    ret
commence:
    ; your code starts...
But you'd better tell ld where your entrypoint is, or it'll guess wrong! (and will crash - see question 7)

If you're using gcc, it will link against some "C startup code" (unless you tell it not to). I think of this as "crt0.o", although I think the correct name is slightly different - we don't need to know, gcc knows. In any case, this file contains the "_start" label. Attempting to add a second one will cause ld to complain. Nasm doesn't care, and I don't think gcc does either. This is ld's job, and it can't make code with two entrypoints!

Besides containing the "_start" label, this "C startup code" does some housekeeping - rearranges environment variables and command line arguments on the stack, among other things - and calls "main", I don't think there's any way to change that name (but see question 6).

2) There's nothing "incorrect" - that's what you told it to do! Nasm uses the '$' character for several different things. In this context, it means approximately "here" - the current address in the assembly. "msg1" is the address in the assembly where it occurs. "$ - msg1" is 26 - count 'em yourself - and that's what sys_write writes. Since '$' means roughly "here", it obviously matters where it occurs. Notice that the zeros are useless in this case - sys_write is not looking for a zero-terminated string (and sys_read won't return one!). "printf" (and other C functions) are looking for a zero-terminated string. It doesn't know the length by magic, it has to calculate it by looking for the zero - every time.

3 & 4) Nasm doesn't use "ptr" (or "PTR") for anything special. Following a size specifier, it would be a syntax error, but you could use "ptr" for a variable name or a label or a macro name with no problem. I guess it's "Intel syntax". Ask them what it does. Nasm's viewpoint is that it doesn't do anything, and we don't use it. When "translating" Masm/Tasm code to Nasm, you can do:
Code: [Select]
%idefine offset
%idefine ptr
Being defined as "nothing", they effectively disappear from the code and Nasm won't choke on 'em. This can be deceptive - other changes may need to be made! You don't ask, but same idea with "offset" - we don't use it. If it's not in square brackets, it means "offset".

5) Initial values of registers at "_start" vary with kernel version. The values are known if you know the kernel version, but unless you're really desperate to save a few bytes, I wouldn't ASSume anything. I'd use "xor eax, eax" instead of "mov eax, 0" - the latter uses 4 bytes to store the 0 - but it doesn't make much difference.  You don't need to zero a register before you load it with another value. If you're using partial registers - al or ah or ax - you may want to have the other bytes in a known state... but you might not need to.

At the "main" label, I imagine it depends on the implementation of C. The (32-bit!) calling convention requires that certain registers - ebp, ebx, esi, and edi - be "preserved" (if you change 'em, put 'em back the way they were). This is not always enforced. This applies if you exit with "ret". If you exit with sys_exit (or "exit()"), it doesn't matter what values are in the registers (except eax and ebx - only bl really counts - for sys_exit) or even if the stack is trashed. You're "supposed" to return a meaningful exit code, but it doesn't break anythinh if you don't. You can see the last exit code with "echo $?" if you care to.

6) It's different for you because you're using GNU C. It doesn't use the underscore (the "_start" label being an exception). Other implementations of C use a leading underscore on external (or global) names. OpenWatcom C uses a trailing underscore - "main_" and "printf_", etc.

Nasm has a handy feature whereby you can use no underscores in your source, and specify "--prefix _" on the command line to put underscores on anything extern or global. "--postfix _" for OpenWatcom users.

7) "ret" returns from a subroutine. It essentially "pops" the return address off the stack, and goes there. The return address has been put on the stack by the "call" instruction. "main" is called, "_start" is not! The first thing on the stack at the "_start" label is "argc" - remember I said the "startup code" moneyed with the stack? The important thing here is that it calls "main". You can't "ret" from the "_start" label because there's no return address there - you have no choice but sys_exit (or "exit()"). From "main" you can "ret" (if you follow the rules) or sys_exit - your choice. In the case of "ret", that "meaningful exit code" goes in eax, for sys_exit in ebx (for "ret" ebx is supposed to be preserved!). "ret" is an instruction, sys_exit is a kernel service. They have to be used slightly differently.

8 ) to be determined :)

Best,
Frank


Offline xdcx

  • Jr. Member
  • *
  • Posts: 12
Re: nasm general questions
« Reply #2 on: August 14, 2013, 02:56:08 PM »
thank you!

Offline xdcx

  • Jr. Member
  • *
  • Posts: 12
Re: nasm general questions
« Reply #3 on: August 19, 2013, 09:29:25 PM »
I'd figure id post here instead of another thread:

my disassembly code (gcc) of C shows this line:
Code: [Select]
jmp    0x400520 <main+52>
<main+52> isn't part of the code but makes it easier to read, what is the 52? the address of main plus 52 bytes down in lines?
would just like to clarify its meaning

i have mixed code on in disassembly window:
Code: [Select]
int main(){
0x4004ec push   rbp
0x4004ed mov    rbp,rsp
    int arr[5] = {0};
0x4004f0 mov    QWORD PTR [rbp-0x20],0x0
0x4004f8 mov    QWORD PTR [rbp-0x18],0x0
0x400500 mov    DWORD PTR [rbp-0x10],0x0
    int i = 0;
0x400507 mov    DWORD PTR [rbp-0x24],0x0

int is 4 bytes in C, why does the compiler use QWORD?
does this initialize the array faster by doing two 4 bytes in one instruction (mov)?
this will not work on 32bit processors right?
« Last Edit: August 19, 2013, 09:32:23 PM by xdcx »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: nasm general questions
« Reply #4 on: August 19, 2013, 09:46:51 PM »
Yeah, it looks like it's initializing two at once. I don't try to figure out why a compiler does what it does. :)

Best,
Frank


Offline dreamCoder

  • Full Member
  • **
  • Posts: 107
Re: nasm general questions
« Reply #5 on: August 31, 2013, 10:50:38 AM »
Hi. I am very new to both NASM and ASM. Instead of wasting a 'new topic', I decided to ask my simple question in here.

Why did I get 'error: expected ')'' from this line.

Code: [Select]
mov si, 1+(31 mod 8)
I tried this code from other other assembler. Thanks.

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: nasm general questions
« Reply #6 on: August 31, 2013, 04:40:27 PM »
'Cause Nasm doesn't know "mod". Try:
Code: [Select]
mov si, 1 + (31 % 8)

It's nice that each assembler can use its preferred syntax... but it's a PITA when you know one and try to use another!

Best,
Frank


Offline dreamCoder

  • Full Member
  • **
  • Posts: 107
Re: nasm general questions
« Reply #7 on: September 01, 2013, 04:44:49 AM »
'Cause Nasm doesn't know "mod". Try:
Code: [Select]
mov si, 1 + (31 % 8)

It's nice that each assembler can use its preferred syntax... but it's a PITA when you know one and try to use another!

Best,
Frank

Thanks Frank. I already got the answer. I was actually trying to port my macro files to NASM. Have to deal with so many operator differences.