Author Topic: Segmentation fault error! (Read 44845 times)

maria03 · « **on:** October 06, 2014, 01:47:18 PM »

Hello,
I have to make an university project:in a first step a have to program an algorithm in C, and in a second step a have to program some of the function in NASM to improve the performance.
The first function i decided to program in nasm was totalCost (it make the sum of an array's elements).
The nasm function is used on the C program and it has the parameters:cost (the array), n (length of the array), sum (the result)

Code: [Select]

global costTot


cost	equ	8
n	equ	12
sum	equ	16



costTot:
	
	push	ebp			
	mov	ebp, esp		
	push	ebx			
	push	esi
	push	edi
	
	mov	edi, [ebp+cost]	; 
	mov	  esi, [ebp+n]		
	xorps 	xmm2,xmm2		

	mov	ecx,0		;index of the loop
		
	movups	xmm0,[edi]
	
	mov	edx,16

	;if an element is less then 0 it is not added
	
.loop	xorps 	xmm1,xmm1
	cmpltps	xmm1,xmm0
	printregps xmm0
	haddps	xmm0,xmm0		
	haddps	xmm0,xmm0		

	
	addss	xmm2,xmm0
	printregps xmm2
	add	ecx,4 ; i use the single precision so in each register i put 4 number 
	
	
	movups  xmm0,[edi+edx]	;
	add	edx,16		;add 16 for the next four number
	
	cmp 	ecx,esi
	jb	.loop
	movups 	[ebp+sum],xmm2
	
	
	pop	edi			; ripristina i registri da preservare
	pop	esi
	pop	ebx
	mov	esp, ebp		; ripristina lo Stack Pointer
	pop	ebp			; ripristina il Base Pointer
	ret				; torna alla funzione C chiamante

When i lunch the program i have segmentation fault and i don't know why.
The code run on the virtualbox with Ubuntu.
Can someone help me?!
Thanks

NB: I'm sorry for the bad english

Rob Neff · « **Reply #1 on:** October 06, 2014, 04:16:35 PM »

Quote from: maria03 on October 06, 2014, 01:47:18 PM

Code: [Select]
printregps xmm0

The call to that function ( I'm assuming a macro invocation call ) will probably destroy the contents in ecx and edx which are being used in your loop.
Without seeing how you've implemented that function I can't really state much more.

maria03 · « **Reply #2 on:** October 06, 2014, 05:30:32 PM »

Sorry you're right!
The prof gave us a file ssutils with some macros the help us in the programming, for example printregps print on the console the four element of a register, i post all the file of the macro utils because printregps uses other macros..

Code: [Select]

extern	printf

section	.bss

dbuf:	resq	1

section	.data

imask:	db	'Stampa: %d',13,10,0
dmask:	db		'%f ',0
cr:		db		10,0
br1:	db		'( ',0
br2:	db		')',10,0
align 16
xmmtemp: dd 0.0, 0.0, 0.0, 0.0


%macro	start	0
		push	ebp
		mov	ebp, esp
		pushad
%endmacro

%macro	stop	0
		popad
		mov	esp, ebp
		pop	ebp
		ret
%endmacro

%macro	prints	1
		pushad
		push	%1
		call	printf
		add	esp, 4
		popad
%endmacro

%macro printreg32 1
	pushad
	push	%1
	push	imask
	call	printf
	add	esp, 8
	popad
%endmacro

%macro printregps 1
	vpushax
        movaps [xmmtemp], %1
        printps xmmtemp, 1
	vpopax
%endmacro


%macro	dprint	1
		pushad
		mov		eax,[%1+4]
		push	eax
		mov		eax,[%1]
		push	eax
		push	dmask
		call	printf
		add		esp, 12
		popad
%endmacro

%macro	sprint	1
		finit
		fld		dword [%1]
		fst		qword [dbuf]
		dprint	dbuf
%endmacro

%macro	printss	1
		sprint	%1
		prints	cr
%endmacro

%macro	printps	2
		prints	br1
		push	edx
		push	ecx
		mov	edx, %1
		mov	ecx, %2
%%loopps:
		sprint	edx
		sprint	edx+4
		sprint	edx+8
		sprint	edx+12
		add	edx, 16
		dec	ecx
		jnz	%%loopps
		pop	ecx
		pop	edx
		prints	br2
%endmacro

%macro	vprintps	2
		prints	br1
		push	edx
		push	ecx
		mov		edx, %1
		mov		ecx, %2
%%loopps:
		sprint	edx
		sprint	edx+4
		sprint	edx+8
		sprint	edx+12
		sprint	edx+16
		sprint	edx+20
		sprint	edx+24
		sprint	edx+28
		add		edx, 32
		dec		ecx
		jnz		%%loopps
		pop		ecx
		pop		edx
		prints	br2
%endmacro

%macro	printsd	1
		dprint	%1
		prints	cr
%endmacro

%macro	printpd	2
		prints	br1
		push	edx
		push	ecx
		mov		edx, %1
		mov		ecx, %2
%%looppd:
		dprint	edx
		dprint	edx+8
		add		edx, 16
		dec		ecx
		jnz		%%looppd
		pop		ecx
		pop		edx
		prints	br2
%endmacro

%macro	vprintpd	2
		prints	br1
		push	edx
		push	ecx
		mov		edx, %1
		mov		ecx, %2
%%looppd:
		dprint	edx
		dprint	edx+8
		dprint	edx+16
		dprint	edx+24
		add		edx, 32
		dec		ecx
		jnz		%%looppd
		pop		ecx
		pop		edx
		prints	br2
%endmacro
%macro	vpush	1
	sub	esp, 16
	movups	[esp], %1
%endmacro

%macro	vpop	1
	movups	%1, [esp]
	add	esp, 16
%endmacro

%macro vpushax 0
	vpush	xmm0
	vpush	xmm1
	vpush	xmm2
	vpush	xmm3
	vpush	xmm4
	vpush	xmm5
	vpush	xmm6
	vpush	xmm7
	
%endmacro

%macro vpopax 0
	
	vpop	xmm7
	vpop	xmm6
	vpop	xmm5
	vpop	xmm4
	vpop	xmm3
	vpop	xmm2
	vpop	xmm1
	vpop	xmm0
%endmacro

I'm so sorry by i begin start program with nasm a few days ago..

Frank Kotler · « **Reply #3 on:** October 06, 2014, 05:42:44 PM »

Thanks Rob. Good catch. I noticed that was a "mystery", but it hadn't occurred to me that it might trash registers (causing your loop not to end and run off into memory you don't "own"... causing a segfault).

This is mostly over my head - I still haven't learned those xmm instructions. I know that some of 'em expect aligned data, and will segfault otherwise. You don't seem to have used those. Messing up the stack so you miss the return address will segfault. You don't seem to have done that. Rob may have caught it.

Something that caught my eye was the way you return the result:

Quote

it has the parameters:cost (the array), n (length of the array), sum (the result)

Code: [Select]

	movups 	[ebp+sum],xmm2

That will put the result on the stack, but will not put it into the C variable, which I assume you want. I think what you want to pass as a parameter is the address of the C variable (&sum ?). In that case, you'd want something like:

Code: [Select]

    mov ecx, [ebp + sum]
    movups [ecx], xmm2

That wouldn't cause a segfault (I don't think), but might cause your code not to work as expected.

Please remember that this is beyond my skill level and I could be completely off base. Good luck with it!

Best,
Frank

Rob Neff · « **Reply #4 on:** October 07, 2014, 02:15:28 AM »

As Frank noted the way you currently attempt to return the results will cause a GPF.
Also, depending on the address of the cost array you most certainly can GPF due to your use of edx.
Finally, make sure that the address in your C code that will contain the results is at least 16 bytes of storage or you will overwrite the stack and/or potentially GPF.
I've inserted comments into the relevant areas of your code below:

Quote from: maria03 on October 06, 2014, 01:47:18 PM

Code: [Select]
global costTot cost equ 8 n equ 12 sum equ 16 costTot: push ebp mov ebp, esp push ebx push esi push edi mov edi, [ebp+cost] ; mov esi, [ebp+n] xorps xmm2,xmm2 mov ecx,0 ;index of the loop movups xmm0,[edi] mov edx,16 ;; <-- You initialize edx to 16 ;if an element is less then 0 it is not added .loop xorps xmm1,xmm1 cmpltps xmm1,xmm0 printregps xmm0 haddps xmm0,xmm0 haddps xmm0,xmm0 addss xmm2,xmm0 printregps xmm2 add ecx,4 ; i use the single precision so in each register i put 4 number movups xmm0,[edi+edx] ; <-- Get the next number ; This move will eventually be a memory access of one number beyond end of array! add edx,16 ;add 16 for the next four number cmp ecx,esi jb .loop movups [ebp+sum],xmm2 ; <-- This is certainly not correct! pop edi ; ripristina i registri da preservare pop esi pop ebx mov esp, ebp ; ripristina lo Stack Pointer pop ebp ; ripristina il Base Pointer ret ; torna alla funzione C chiamante

I made some very slight adjustments. Note that the code is still a far cry from "optimized". We're more concerned with getting you past the GPF at the moment.

Code: [Select]

costTot:
	
	push	ebp			
	mov	ebp, esp		
	push	ebx			
	push	esi
	push	edi
	
	xorps 	xmm2,xmm2   ; initialize the return value

	mov	edi, [ebp+cost]	;
	mov	  esi, [ebp+n]
	cmp	  esi, 0
	jle  .done    ; prevent a zero or negative length array

	mov	ecx,0		;index of the loop

	xor  edx, edx    ; offset into array

	;if an element is less then 0 it is not added
	
.loop	
	movups	xmm0,[edi + edx]

        xorps 	xmm1,xmm1
	cmpltps	xmm1,xmm0
	printregps xmm0
	haddps	xmm0,xmm0		
	haddps	xmm0,xmm0		

	
	addss	xmm2,xmm0
	printregps xmm2

	add	ecx,4 ; i use the single precision so in each register i put 4 number 
	
	add	edx,16		;add 16 for the next four number
	
	cmp 	ecx,esi
	jb	.loop

.done:

	mov  ecx, [ebp+sum]
	movups 	[ecx], xmm2   ; Note: make sure that the address in your C code
                                              ; that will contain the results is at least 16 bytes!

	pop	edi			; ripristina i registri da preservare
	pop	esi
	pop	ebx
	mov	esp, ebp		; ripristina lo Stack Pointer
	pop	ebp			; ripristina il Base Pointer
	ret				; torna alla funzione C chiamante

That code is untested so give it a try and report back your results.

maria03 · « **Reply #5 on:** October 07, 2014, 08:35:47 AM »

I tried this changes but nothing is changed.

I have also tried to comment the macro printregps but nothing, there is always the problem.
Each number is a float, and so it is 4 byte. So I think that the result is smaller than 16 bytes.
I can't understand because the first time i run code it is all ok, the next day when i try to run it, segmentation fault.

maria03 · « **Reply #6 on:** October 07, 2014, 02:33:49 PM »

Maybe the another test is to align the address, but i don't know how do it! Yes, i know that i must use the "aps" instructions but i don't know what i must do before..

Rob Neff · « **Reply #7 on:** October 07, 2014, 03:26:26 PM »

Post your C code that is calling your assembly function so we can examine it as well.
We will find your bug.

Frank Kotler · « **Reply #8 on:** October 07, 2014, 04:24:33 PM »

Quote

Each number is a float, and so it is 4 byte. So I think that the result is smaller than 16 bytes.

I'm not sure that this is correct. As I understand it, you want just one single-precision float - four bytes - the sum of your array. But unless I'm mistaken (entirely possible) an xmm register holds four such floats - sixteen bytes - and you need to provide storage for that. I don't know how you find the one you want...

Now that you've provided the macros, I may attempt to assemble your code and fake up a caller to make something I can run (if I can get into the mood). I'll never learn it any younger.

If I were you, I'd stick with the "ups" instructions until you get that working, then advance to the "aps" instructions. Take it one step at a time, would be my advice.

If I may be permitted a totally off-topic comment... I've got some friends visiting Italy and sending back pictures. What a beautiful country!

Best,
Frank

Frank Kotler · « **Reply #9 on:** October 07, 2014, 05:43:59 PM »

Well... no luck. I knew that I wasn't familiar with "haddps", but apparently my CPU is unfamiliar with it too. Crashes with "illegal instruction" after printing one line of zeros. I really must upgrade this beast Real Soon Now.

After commenting out those two lines, it prints two lines of zeros (surrounded by parentheses). I don't think this is correct - shouldn't I see some numbers from my array? It at least exits without a segfault...

Best,
Frank

Code: [Select]

; nasm -f elf32 maria.asm
; ld -o maria maria.o -I/lib/ld-linux.so.2 -lc -melf_i386


%include "ssutils.inc"

global _start

section .data
    carray dd 1.0, 2.0, 3.0, 36.0

section .bss
    csum reso 1

section .text
_start:
    nop
; fake caller    
    push carray
    push 4
    push csum
    call costTot
    add esp, 12

exit:
    mov ebx, eax
    mov eax, 1 ; sys_exit
    int 80h

global costTot


cost	equ	8
n	equ	12
sum	equ	16



costTot:
	
	push	ebp			
	mov	ebp, esp		
	push	ebx			
	push	esi
	push	edi
	
	mov	edi, [ebp+cost]	; 
	mov	  esi, [ebp+n]		
	xorps 	xmm2,xmm2		

	mov	ecx,0		;index of the loop
		
	movups	xmm0,[edi]
	
	mov	edx,16

	;if an element is less then 0 it is not added
	
.loop	xorps 	xmm1,xmm1
	cmpltps	xmm1,xmm0
	printregps xmm0

; this instruction crashes my machine
; obviously, your algorithm won't work without it
; but "just to try"... comment it out
;	haddps	xmm0,xmm0		
;	haddps	xmm0,xmm0		

	
	addss	xmm2,xmm0
	printregps xmm2
	add	ecx,4 ; i use the single precision so in each register i put 4 number 
	
	
	movups  xmm0,[edi+edx]	;
	add	edx,16		;add 16 for the next four number
	
	cmp 	ecx,esi
	jb	.loop
;	movups 	[ebp+sum],xmm2
    mov ecx, [ebp + sum]
    movups [ecx], xmm2
	
	
	pop	edi			; ripristina i registri da preservare
	pop	esi
	pop	ebx
	mov	esp, ebp		; ripristina lo Stack Pointer
	pop	ebp			; ripristina il Base Pointer
	ret				; torna alla funzione C chiamante

maria03 · « **Reply #10 on:** October 07, 2014, 09:40:08 PM »

I tried to comment the haddps and I had the same problem. I am so desperate that a try to comment ALL the code, except the first part when I save the value of the register in the stack, and the last part when i pop the value from the stack. Only leaving there the instruction xops xmm2,xmm2 I have the segmentation fault. How is it possible?!
I can't understand. Maybe is some problem with the Virtualbox?! I can't think anything..

Frank Kotler · « **Reply #11 on:** October 08, 2014, 01:35:40 AM »

Anything is possible. Unlikely that it's a problem with Virtualbox. Unlikely that it's a problem with "xorps". But see point 1.

You need a debugger. If you were running your code in a debugger, you'd know where the segfault occurs, at least... so I assume you're not. Learning to run a debugger will take a while, but it's the "right" way to do it...

Here's a "trick": deliberately put the program in an infinite loop...

Code: [Select]

hang: jmp hang

I'd start with that just before the "ret". If it segfaults before it hangs, move it up earlier in the code, but I don't think it will. Kill your program with "control-c", and don't forget to remove that code when you're done with it!

ASSuming that it hangs before it segfaults, we know the problem's on the stack, and/or back in your C code. Rob suggested that you post the C code that calls your little gem. If you don't want to post the whole thing, at least show us the line that calls "costTot", and how the parameters it uses are declared.

If you're still doing:

Code: [Select]

movups [ebp + sum], mmx2

That messes with the stack, and could be a problem - change it to what Rob suggested.

As Betov (a rude Frenchman, author of Rosasm) used to say, "Courage!"

Best,
Frank

maria03 · « **Reply #12 on:** October 08, 2014, 10:09:09 AM »

Sorry, i forget yesterday to show the C code: I'm afraid I haven't translated the names of the functions.. But I insert some comments so I help you to understand the code, the matrix is rappresented by one block of memory for reasons of efficiency:

Code: [Select]

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
#include <xmmintrin.h>

void costoTotaleN(float* costi, int n,float* somma); // The nasm Function
float* creaMatrice(int n,int m); //create a new matrix with n rows and m columns
void riempiMatrice(float* matrice,int n, int m); // you can introduce the elements of the matrix by keybord

main(int argc,char* argv[]){
float* costi;
	float* somma;
	int n=8;
	costi=creaMatrice(n,1);
	somma=creaMatrice(1,1);
	riempiMatrice(costi,n,1);
	costoTotaleN(costi,n,somma);
	printf("costo totale %f\n ",somma[0]);
}


float* creaMatrice(int n,int m){
	float* matrice;
	matrice=(float*)malloc(n*m*sizeof(float));

}

void riempiMatrice(float* matrice,int n, int m){
	int i,j;
	int unused __attribute__((unused));
	for (i=0;i<n;i++){
		for(j=0;j<m;j++){
			printf("Inserisci elemento matrice[%d][%d]= ",i,j);
			unused=scanf("%f",&matrice[i*m+j]);
		}

	}	

 }

Frank Kotler · « **Reply #13 on:** October 08, 2014, 02:15:13 PM »

Well that's interesting. The "interesting" part is that it doesn't segfault for me! I don't get the correct answer. That's expected, since I had to comment out the "haddps" instructions. My machine is too old and tired to know that one. I sympathize.

I also don't have xmmintrin.h. Since you don't seem to use it in this version of the C code, I commented that out, too. That was the only change I made to your C code.

This is exactly the .asm file I used:

Code: [Select]

; nasm -f elf32 maria.asm
; ld -o maria maria.o -I/lib/ld-linux.so.2 -lc -melf_i386


%include "ssutils.inc"

%define costTot costoTotaleN

%ifdef TESTMAIN

global _start

section .data
    carray dd 1.0, 2.0, 3.0, 36.0

section .bss
    csum reso 1

section .text
_start:
    nop
; fake caller    
    push carray
    push 4
    push csum
    call costTot
    add esp, 12

exit:
    mov ebx, eax
    mov eax, 1 ; sys_exit
    int 80h

%endif

global costTot


cost	equ	8
n	equ	12
sum	equ	16



costTot:
	
	push	ebp			
	mov	ebp, esp		
	push	ebx			
	push	esi
	push	edi
	
	mov	edi, [ebp+cost]	; 
	mov	  esi, [ebp+n]		
	xorps 	xmm2,xmm2		

	mov	ecx,0		;index of the loop
		
	movups	xmm0,[edi]
	
	mov	edx,16

	;if an element is less then 0 it is not added
	
.loop	xorps 	xmm1,xmm1
	cmpltps	xmm1,xmm0
	printregps xmm0

; this instruction crashes my machine
; obviously, your algorithm won't work without it
; but "just to try"... comment it out
;	haddps	xmm0,xmm0		
;	haddps	xmm0,xmm0		

	
	addss	xmm2,xmm0
	printregps xmm2
	add	ecx,4 ; i use the single precision so in each register i put 4 number 
	
	
	movups  xmm0,[edi+edx]	;
	add	edx,16		;add 16 for the next four number
	
	cmp 	ecx,esi
	jb	.loop
;	movups 	[ebp+sum],xmm2
    mov ecx, [ebp + sum]
    movups [ecx], xmm2
	
	
	pop	edi			; ripristina i registri da preservare
	pop	esi
	pop	ebx
	mov	esp, ebp		; ripristina lo Stack Pointer
	pop	ebp			; ripristina il Base Pointer
	ret				; torna alla funzione C chiamante

As you can see, I "defined out" the fake caller and "translated" costTot to the beautiful language. If you're going to try it, you probably want to uncomment the haddpses (try it both ways, perhaps).

There are a couple issues that confuse me. When I run it with my fake caller, I get a couple of lines of zeros printed. When I run it with your C caller, I see the expected numbers printed. (I just entered 1, 2, 3, 4, 5, 6, 7, 8 ) This is apparently an error in my fake caller, but I don't see it. I'm new to this xmm stuff...

What I "thought" when I saw your C code was that "summa" isn't big enough. Try mallocing that to 4 * 1 (or 4 * 0 ?) and see if it makes any difference.

I can't explain why this should segfault for you and not for me. Maybe a problem with Virtualbox, but I doubt it very much. If I feel ambitious I'll fool with it some more, but for right now I'm just puzzled!

Best,
Frank

encryptor256 · « **Reply #14 on:** October 08, 2014, 05:13:33 PM »

Hi Folks!

I think it fault's because of memory somewhere.

MOVUPS
Move four unaligned packed single-precision floating-point values between XMM registers or
between and XMM register and memory

So, do what ever you want, instruction movups will LOAD OR STORE 16 bytes FROM or TO memory.
Dare to access unallocated space and you're done.

Other thing:

Code: [Select]

costi=creaMatrice(n,1);
somma=creaMatrice(1,1);

...

float* creaMatrice(int n,int m){
	float* matrice;
	matrice=(float*)malloc(n*m*sizeof(float));

}

I dont see that function creaMatrice returns some value, where is return?, "return matrice;" is missing.

EDIT:
This code:

Quote

movups [ebp+sum],xmm2

will give seg fault, because your sum is only four bytes.

At least that's what I think, so far!

...
All Rights and Lefts Reserved, I keep rights to modify this post for one reason or another, later or never!

EDIT 2: Here I come again:

Heads up, this might cheer you up: https://www.youtube.com/watch?v=lzUat_wC6Ko

EDIT 3:

How about this instruction, like MOVD.
The MOVD (Move 32 Bits) instruction transfers 32 bits of packed data from memory to an MMX register and vice
versa; or from a general-purpose register to an MMX register and vice versa.

This code looks permitted on both x32 and x64:

Code: [Select]

movd ecx,xmm0

... or....

movd [ebp+sum],xmm0

So, that code should load only 32 bit float into variable. Should work, theoretically.

NASM - The Netwide Assembler

News:

Author Topic: Segmentation fault error! (Read 44845 times)

maria03

Segmentation fault error!

Rob Neff

Re: Segmentation fault error!

maria03

Re: Segmentation fault error!

Frank Kotler

Re: Segmentation fault error!

Rob Neff

Re: Segmentation fault error!

maria03

Re: Segmentation fault error!

maria03

Re: Segmentation fault error!

Rob Neff

Re: Segmentation fault error!

Frank Kotler

Re: Segmentation fault error!

Frank Kotler

Re: Segmentation fault error!

maria03

Re: Segmentation fault error!

Frank Kotler

Re: Segmentation fault error!

maria03

Re: Segmentation fault error!

Frank Kotler

Re: Segmentation fault error!

encryptor256

Re: Segmentation fault error!