NASM - The Netwide Assembler

NASM Forum => Programming with NASM => Topic started by: azagaros on May 26, 2019, 08:17:51 PM

Title: Segmentation fault question
Post by: azagaros on May 26, 2019, 08:17:51 PM: This is on linux 64. I am getting a segmentation fault at trying to read the command line arguments and print them out. I am not sure where my logic error is or which boundary I am crossing. As far as I know my management of the stack is working.
Code: [Select]
section .data strNewLine db 10 section .text global _start _start: pop rcx ; pop the count of arguments .argloop: pop rax ; pop the ptr of the arg. call _printstr ; print the string call _printnl ; print a new line character dec rcx cmp rcx, 0 jne .argloop call _ExitProg ret ;rax ptr to string to count ;rbx count of string _printstr: push rcx push rax mov rbx,0 .lenloop: inc rax inc rbx mov cl, [rax] cmp cl, 0 jne .lenloop mov rax, 1 mov rdi, 1 pop rsi ;pop rax to rsi mov rdx, rbx syscall pop rcx ret _printnl: mov rax, 1 mov rdi, 1 mov rsi, strNewLine mov rdx, 1 syscall ret _ExitProg: xor rdi, rdi mov rax, 60 syscall
Title: Re: Segmentation fault question
Post by: Frank Kotler on May 26, 2019, 10:15:39 PM: I am not good on 64-bit code. It looks like rcx is being trashed in one of your routines. Either save rcx across the two routines, or test rax for zero instead of counting on argc.

I think that'll work...

Best,
Frank
Title: Re: Segmentation fault question
Post by: fredericopissarra on May 26, 2019, 10:16:51 PM: Maybe this could help:
Code: [Select]
; test.asm ; ; compile with: ; $ nasm -felf64 -o test.o test.asm ; $ ld -o test test.o ; bits 64 default rel ; x86-64 default addressing is RIP relative! ; Moved to .rodata. section .rodata strNewLine: db `\n` section .text global _start _start: ; argc is in [rsp] ; argv[n] are in [rsp+8*n+8] ;mov ecx,[rsp] ; argc (not used here!) xor ebx,ebx ; RBX is saved by syscalls (SysV ABI). ; Used as index... .loop: mov rdi,[rsp+rbx*8+8] test rdi,rdi ; argv[rbx] == NULL? jz .no_more_args call printstr ; print the string call printnl ; print a new line character add ebx,1 ; avoid using INC/DEC (they are slow!). jmp .loop .no_more_args: mov eax,60 ; syscall_exit xor edi,edi ; exit(0). syscall ; rdi = string ptr. strlen: xor eax,eax .loop: cmp byte [rdi+rax],0 jz .strlen_end add eax,1 jmp .loop .strlen_end: ret ;rdi = string ptr printstr: call strlen mov edx,eax ; string length. mov eax,1 ; syscall_write mov rsi,rdi ; string ptr. mov edi,eax ; stdout syscall ret printnl: mov eax,1 ; syscall_write mov edi,eax ; stdout mov edx,eax ; 1 char do print lea rsi,[strNewLine] ; points to '\n'. syscall ret
Title: Re: Segmentation fault question
Post by: azagaros on May 27, 2019, 04:57:14 PM: Thank you Fredericopissarra. You got me by one stumbling block. Just trying to understand how the registers and stack on linux are handled is being challenging. A small expansion of code has created a new segmentation fault.
Code: [Select]
bits 64 default rel ; x86-64 default addressing is RIP Relative ; section .data section .rodata strNewLine db 10 strSpace db 32 strFileName db 'is a file name.',0 strOption db 'is an option flag.',0 section .text global _start _start: xor rbx, rbx ;zero the rbx- will not get trashed by sys v abi (the system calls) ; rsp is the **argv .argloop: mov rdi, [rsp+rbx*8+8] test rdi, rdi ; testing for null jz .noMoreArg ; jump on zero cmp rbx, 0 ; consuming the first arg. je .skipProcess call printstr ; print the string call printSpc ; print a space ; procecc the arg call processArg call printnl ; print a new line character .skipProcess: add rbx, 1 ; increment the index. jmp .argloop .noMoreArg: call ExitProg ret ; rdi = string ptr strlen: xor rax, rax ; zero rax register .loop: cmp byte[rdi+rax],0 ;check for null termination of string jz .strlenend add rax,1 jmp .loop .strlenend: ret ; rdi = str ptr printstr: call strlen mov rdx, rax ; store the length rdx mov rax, 1 ; syscall write mov rsi, rdi ; str ptr mov rdi, rax ; stdout syscall ret ; rdi = str ptr printnl: mov rax, 1 mov rdi, rax mov rdx, rax lea rsi, [strNewLine] syscall ret ;rdi = str ptr printSpc: mov rax, 1 mov rdi, rax mov rdx, rax lea rsi, [strSpace] syscall ret ; rdi = str ptr to process processArg: xor rax, rax ;string index position cmp byte[rdi+rax],'-' je .argOption ; is the stack getting trashed? stack alignment? (system calls use the local stack?) push rdi ; save the current rdi (should not change) push rax ; save the working rax (It is getting trashed?) (it gets trashed in the string length function) mov rdi, strFileName ; set up a differnet rdi... call printstr pop rax pop rdi ;Assumed file concept (can have more than one) ;allocate a file infromation block ;copy the file name to the file name of the block .argOption: add rax, 1 ;consume the option flag ; the test for the list of options. push rax push rdi mov rdi, strOption call printstr pop rdi pop rax ret ExitProg: xor rdi, rdi mov rax, 60 syscall
As for you Kolter as a NASM developer some food for thought:
https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models (https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models)

every thing I am fighting seems to be in the code generation of Nasm or Fasm for that matter
Title: Re: Segmentation fault question
Post by: fredericopissarra on May 27, 2019, 06:06:44 PM: SysV ABI amd64 calling convention:

The integer arguments (or pointers) are passed to a function in order: RDI, RSI, RDX, RCX, R8 and R9, from 7th argument to the last, they are stored on stack cdecl style. For syscalls, instead of RCX, R10 is used.

For floating point, XMM0 to XMM7 will be used for arguments.

Example: int f( int a, float b, int c ); // EDI=a, XMM0=b, ESI=c

Registers RSP, RBP, RBX and R12 to R15 must be preserved between calls. Any other registers can be changed inside a function.

RAX is the returned value...
Title: Re: Segmentation fault question
Post by: azagaros on May 27, 2019, 06:35:55 PM: First off tell me where I am touching anything but the system calls.. I am not making the function calls. It appears the registers get trashed in a task switch of linux. Assembly has always been the argument of the registers. I save registers if I do not care what they return as from process. I am being specific about the calls and when I do them. Most of the registers I have not touched. I have read the abi backwards and forwards more times that care to realize. I have something interfering with position by most addressing models that should not move, so any number I move from x to y, it does not change that number. If it gets relocated by the OS, the os should clean up the pointers. I am crossing a boundary because a number changed by some abstract. My assembly do not change any numbers and I expect enter and exit of any function to change certian numbers. Which stupidity am I really following doing simple assembly? I have been chasing bugs like every time I code something on linux in 64 bit and have been doing assembly for 30+ years..
Title: Re: Segmentation fault question
Post by: fredericopissarra on May 27, 2019, 06:59:08 PM: Take a closer look at your printstr function... What registers it changes?
Title: Re: Segmentation fault question
Post by: fredericopissarra on May 27, 2019, 07:16:26 PM: Another thing: Avoid using R?? registers if you can. If you need a simple counter, 32 bits will sufice and you should use E??. This because using R?? for other thing than pointers will add an REX prefix to instructions and the imediate values will be 64 bits long...

XOR EAX,EAX will clear the entire RAX and it is a shorter instruction... Every instruction which deals with E?? registers will erase, automatically the upper bits of R?? registers...

To load an effective address (pointer), you should use LEA instruction to take advantage of RIP relative addressing. This:

MOV RDI,strNewLine

Is better defined as

LEA RDI,[strNewLine]

Because it is wise do prefer Position Indementent Executables (PIE) in x86-64 mode.
Title: Re: Segmentation fault question
Post by: azagaros on May 27, 2019, 07:25:38 PM: oh wow. talk about stupidity... rax - 64bit, eax -32 bit, ax - 16 bit al or ah is 8 bit.. of the same register. if I read another post nasm through optimization selects the smallest form. 64 bits addresses are 3 or 4 bytes long, 32 bit are 2. that is code generation, it is in the prefixes on the byte sequence to denote 16, 32, 64 bit number that follows..

it could be your fallacy, if you forget there is a full 64 bit number in a register when you only zero out 32 bits of the register, it does not end up 0..
Title: Re: Segmentation fault question
Post by: fredericopissarra on May 27, 2019, 08:24:38 PM: Quote from: azagaros on May 27, 2019, 07:25:38 PM
oh wow. talk about stupidity... rax - 64bit, eax -32 bit, ax - 16 bit al or ah is 8 bit.. of the same register. if I read another post nasm through optimization selects the smallest form. 64 bits addresses are 3 or 4 bytes long, 32 bit are 2. that is code generation, it is in the prefixes on the byte sequence to denote 16, 32, 64 bit number that follows..

it could be your fallacy, if you forget there is a full 64 bit number in a register when you only zero out 32 bits of the register, it does not end up 0..
So, what if you read from Intel (https://software.intel.com/en-us/articles/intel-sdm) and AMD (https://developer.amd.com/resources/developer-guides-manuals/) themselves... READ the Development Manuals and see for yourself.
Title: Re: Segmentation fault question
Post by: fredericopissarra on May 27, 2019, 08:36:02 PM: A simple test:
Code: [Select]
; asm.asm bits 64 default rel section .text global f ; unsigned long long f( unsigned long long ); ; Entry: RDI ; Output: RAX f: mov rax,rdi mov eax,1 ; it will zero the upper 32 bits? ret
Code: [Select]
/* test.c */ #include <stdio.h> extern unsigned long long f( unsigned long long ); int main( void ) { unsigned long long x; x = f(0xffffffffffffffffULL); // all bits set. // Maybe it will print 0xffffffff00000001?! Nah! printf( "%#016llx\n", x ); }
Code: [Select]
# Makefile test: test.o asm.o $(CC) -o $@ $^ test.o: test.c asm.o: asm.asm nasm -felf64 -o $@ $<Just:
Quote
$ make
$ ./test
Title: Re: Segmentation fault question
Post by: azagaros on May 27, 2019, 09:30:57 PM: Gee your stupidity again. I have 5 programming languages under my belt. I have read the books you dumb a**. From every piece of documentation of registers:
| 32 | 16 | 16| rax is the full 64 and eax is the bottom 32 and ax is then set of 16 is the last set. rax is 8 bytes eax 4. Adding the two should look and act like for furthest right bit goes first and cycles to the left. if it overflows the 64 bit it will pop the carry flag. Endian just changes the direction of the cycling goes through. This has not changed in 30 years of x86 class processors. I think arm processors do it the same way. It is the way the transistors are put in sequence. Clearing of the register is left up to the programmer and most bios leave them empty when they start an os. Did you not learn how to add two numbers early in your life? 2's complement follows same logic if you have ever did the math by hand. You start at the zero bit and go to max bit and Big and little endian determine where the 0 bit is.

Since I know that impling a register in nasm, does not guarantee the size of the register. If you want the smallest code concept, it is in the preamble on any integer or pointer; by sequence of any command. In code generation, by both the AMD and intel documentation, you have the byte code of command, then, depending on command, is the scaler on a number and the bytes that make up the number. All address space is pointers, which are all integers. In nasm could take a 64 bit pointer and inadvertently make it a 16 bit pointer, if the value was under 2^16. A 16 bit number has no scaler and is only 2 bytes for the number, a 32 has 2 byte scaler with 4 bytes to the number and 64 has a 3 byte scaler and 8 bytes to the number. There is a lot of space saving in this concept. it also argues the byte concept of the command

One thing that is in AS-- gcc's assembler. It has a denotation to imply which state of command you use. For example mov, movw, movq, mov is the 8 bit/16 bit concept, and movw is the 32 bit concept and movq is the 64 bit concept.
Title: Re: Segmentation fault question
Post by: debs3759 on May 27, 2019, 09:47:53 PM: azagaros, please do not insult someone who is trying to help you. You may know several programming languages, but that does not mean you know all languages. Higher lever languages do much of the work for you, so you don't need to know as much in order to use them. I have never tried programming in 64-bit code, but fredericopissarra appears to have done so. As he says, the Intel/AMD documentation gives much more info than those who work on nasm, who used that info to write an app which follows Intel assembly coding conventions. Our documentation will never be as complete as that you can get (or maybe have) from Intel.
Title: Re: Segmentation fault question
Post by: fredericopissarra on May 27, 2019, 09:53:21 PM: Intel IA-32 and Intel64 Software Development Manual - Vol I - Chapter 3, Topic 3.4.1.1 (General Purpose Registers in 64 bits mode):

"When in 64-bit mode, operand size determines the number of valid bits in the destination general-purpose
register:
...
* 32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose
register."

AMD64 Architecture Programmer’s Manual Volume 1 - Application Programming - Chapter 3, Topic 3.1.2 (64-Bit Mode Registers), Figure 3-3 and 3-4... Topic 3.1.2.3 explains...
Title: Re: Segmentation fault question
Post by: fredericopissarra on May 27, 2019, 09:56:16 PM: By the way... I have almost 40 years of software development AND electronics experience, not only with assembly (of more than a dozen of processors), but lots of high level languages as well...

You're welcome.
Title: Re: Segmentation fault question
Post by: azagaros on May 27, 2019, 10:24:38 PM: Debs, all languages boil down to the machine language on any processor. 5 of them have been the most common I have been programming for 30 years. All other languages have the same basic logic to them and if one can pick up on the syntax of the language, one pick out the common struct to all of them because of the machine language concept. It is like I can read lisp but cannot program in it. The concepts from 8086 or 6502 or the 68000 from various companies follow similar constructs of machine language. Even the current forms of arm V follow simple abstracts of those earlier ones. It is like the simple move from memory to register is common to all of them.