Print Page - What's wrong in this code please?

Request someone to point out the errors in this code please. [g++, nasm64, windows 10, fastcall]

C++ caller:

extern "C" char* ConCatProc(char*, char*);


int main() {
	char str1[] = "Test string from C++";
	char str2[] = "Test string from NASM64";
	
	char* rax = ConCatProc(str1, str2);
	cout << "Concat: " <<rax<< endl;
	
	return 0;
}

The called Nasm routine:

Code: [Select]

segment .data

	concbuf		times 2 dq " ",'0'

segment .text

	global ConCatProc

ConCatProc:
	pushfq

	mov [concbuf], rcx
	mov [concbuf + 1], rdx

	mov rax, [concbuf]


	popfq
	ret

Appreciate if someone can help correcting the problem.
Thanks.

Maybe what you want is a correct buffer setup with large enough size to hold both strings.

Code: [Select]

section .bss
conBuff: resb 100

Then the next logical steps would be;

0. Make RDI point to the buffer.
1. Find the length of first string.
2. Copy the 1st string to the buffer, according to the length
3. Find the length of second string
4. Copy the 2nd string to the buffer, according to the length
5. 0-ended the buffer
6. Return the address of the buffer in RAX

Need to use lots of string instructions (stosb, scasb, rep, repne etc). I think you should be aware what you are dealing with. Calling C's "strcat" is probably not a bad idea if you're new.

Quote from: stressful on March 22, 2018, 01:42:42 PM

Maybe what you want is a correct buffer setup with large enough size to hold both strings.

Code: [Select]
section .bss conBuff: resb 100
Then the next logical steps would be;

0. Make RDI point to the buffer.
1. Find the length of first string.
2. Copy the 1st string to the buffer, according to the length
3. Find the length of second string
4. Copy the 2nd string to the buffer, according to the length
5. 0-ended the buffer
6. Return the address of the buffer in RAX

Need to use lots of string instructions (stosb, scasb, rep, repne etc). I think you should be aware what you are dealing with. Calling C's "strcat" is probably not a bad idea if you're new.

Thanks for your reply.
I was also thinking in similar lines and so followed your advice to write this implementation below.
But it still does not work, and returns only the first string.

Could you point out the problems please?

Code: [Select]

	segment .data

	segment .bss
concbuf:	resb 100

	segment .text

	global ConCatProc


;Calculate length of string in RCX
LengthOfStringProc:
	push rdi

	mov rdi, rcx
	xor rax, rax

_strlen_next:
    cmp         byte [rdi],  0
    jz    		_strlen_null_or_done
    inc 		rdi
    inc         rax
    jne         _strlen_next

_strlen_null_or_done:

	mov rdi, rcx

    ; return length of string in RAX

    pop rdi
	ret



;Proc called from C++
ConCatProc:

	mov r9, rcx
	call LengthOfStringProc
	mov r10, rax ;length of str 1

	mov r11, rdx
	mov rcx, rdx
	call LengthOfStringProc
	mov r12, rax ;length of str 1

	push rdi

	lea rdi, [concbuf] ;Make RDI point to the buffer.

	;Copy the 1st string to the buffer, according to the length
	cld
	mov rax, r9
	mov rcx, r10
	rep stosq

	;Copy the 2nd string to the buffer, according to the length
	lea rdi, [concbuf + r10+1]
	cld
	mov rax, r11
	mov rcx, r12
	rep stosq

	;0-ended the buffer
	lea rdi, [concbuf + r10+r12+1]
	mov rdi, '0'

	mov rax, [concbuf] ;should return the concatenated buffer? NOT WORKING, returns only the first string

	pop rdi

	ret

EDIT: I think I cannot use the QWORD variant of STOS. Have to use the STOSB. Is that correct?
Thanks a lot.

You're close. Only that I expected you use MOVSB instead of STOSQ. I suspect you're dealing with UTF characters?

Here a quick code, dealing with ASCII text, that is close enough to your interpretation and my suggested algorithm above...

Code: [Select]

        global ConCatProc

        section .bss
conbuf: resb 100

        section .text
ConCatProc:
        pushfq
        cld                ;all-forward operation
        mov     rdi,conbuf ;pointer to buffer
        mov     r8,rcx     ;first string
        mov     r9,rdx     ;second string

        mov     rsi,r8     ;find string1 length
        call    str_length
        mov     rcx,rax
        rep     movsb      ;copy to buffer. RSI--> RDI

        mov     rsi,r9     ;find string2 length
        call    str_length
        mov     rcx,rax
        rep     movsb      ;copy to buffer. RSI--> RDI

        xor     al,al
        stosb              ;0-ended the buffer
        mov     rax,conbuf ;return buffer's address
        popfq
        ret

;Find length of a C string
str_length:
        push    rdi
        mov     rdi,rsi
        mov     al,0
        mov     rcx,-1
        repne   scasb
        mov     rax,-2
        sub     rax,rcx
        pop     rdi
        ret

The key here is to not manually modify RDI as the buffer's pointer. Let it increments naturally in MOVSB instructions.

Quote from: stressful on March 22, 2018, 04:46:40 PM

You're close. Only that I expected you use MOVSB instead of STOSQ. I suspect you're dealing with UTF characters?

Here a quick code, dealing with ASCII text, that is close enough to your interpretation and my suggested algorithm above...

Code: [Select]
global ConCatProc section .bss conbuf: resb 100 section .text ConCatProc: pushfq cld ;all-forward operation mov rdi,conbuf ;pointer to buffer mov r8,rcx ;first string mov r9,rdx ;second string mov rsi,r8 ;find string1 length call str_length mov rcx,rax rep movsb ;copy to buffer. RSI--> RDI mov rsi,r9 ;find string2 length call str_length mov rcx,rax rep movsb ;copy to buffer. RSI--> RDI xor al,al stosb ;0-ended the buffer mov rax,conbuf ;return buffer's address popfq ret ;Find length of a C string str_length: push rdi mov rdi,rsi mov al,0 mov rcx,-1 repne scasb mov rax,-2 sub rax,rcx pop rdi ret
The key here is to not manually modify RDI as the buffer's pointer. Let it increments naturally in MOVSB instructions.

Thanks a lot!

I tried with MOBSB earlier, but somehow could not make it work correctly. :-[

Now thanks to you I've re-adjusted my code accordingly and it works great.

The fact is that the second MOVSB can also be replaced by a MOVSQ, like in the following code.
Also, since I have used LEA DI, buffer to load the address instead of MOV, I think I can get rid of CLD:

Code: [Select]

  global ConCatProc, LengthOfStringProc

        section .bss
concbuf: resb 100

        section .text
ConCatProc:
        pushfq
        ;;;;;;;;;;;;;cld                ;all-forward operation

        mov r9, rcx
		call LengthOfStringProc
		mov r10, rax ;length of str 1

		mov r11, rdx
		mov rcx, rdx
		call LengthOfStringProc
		mov r12, rax ;length of str 2

		lea rdi, [concbuf]  ;Make RDI point to the buffer.

        mov     rsi,r9
        mov     rcx,r10
        rep     movsb      ;copy to buffer. RSI--> RDI


        mov     rsi,r11
        mov     rcx,r12
        rep     movsq      ;copy to buffer. RSI--> RDI

        xor     al,al
        stosb               ;0-ended the buffer
        mov     rax,concbuf ;return buffer's address

        popfq
        ret

;Find length of a C string
LengthOfStringProc:
	push rdi

	mov rdi, rcx
	xor rax, rax

_strlen_next:
    cmp         byte [rdi],  0
    jz    		_strlen_null_or_done
    inc 		rdi
    inc         rax
    jne         _strlen_next

_strlen_null_or_done:

	mov rdi, rcx

    ; return length of string in RAX

    pop rdi
	ret

Nonetheless, I think your implementation of string length is far better optimized than mine.
Anyway, it works now, and thanks again for your help and time!

Quote from: prdas31 on March 23, 2018, 03:50:50 AM

Quote from: stressful on March 22, 2018, 04:46:40 PM
You're close. Only that I expected you use MOVSB instead of STOSQ. I suspect you're dealing with UTF characters?

Here a quick code, dealing with ASCII text, that is close enough to your interpretation and my suggested algorithm above...

Code: [Select]
global ConCatProc section .bss conbuf: resb 100 section .text ConCatProc: pushfq cld ;all-forward operation mov rdi,conbuf ;pointer to buffer mov r8,rcx ;first string mov r9,rdx ;second string mov rsi,r8 ;find string1 length call str_length mov rcx,rax rep movsb ;copy to buffer. RSI--> RDI mov rsi,r9 ;find string2 length call str_length mov rcx,rax rep movsb ;copy to buffer. RSI--> RDI xor al,al stosb ;0-ended the buffer mov rax,conbuf ;return buffer's address popfq ret ;Find length of a C string str_length: push rdi mov rdi,rsi mov al,0 mov rcx,-1 repne scasb mov rax,-2 sub rax,rcx pop rdi ret
The key here is to not manually modify RDI as the buffer's pointer. Let it increments naturally in MOVSB instructions.

Thanks a lot!

I tried with MOBSB earlier, but somehow could not make it work correctly. :-[

Now thanks to you I've re-adjusted my code accordingly and it works great.

The fact is that the second MOVSB can also be replaced by a MOVSQ, like in the following code.
Also, since I have used LEA DI, buffer to load the address instead of MOV, I think I can get rid of CLD:

Code: [Select]
global ConCatProc, LengthOfStringProc section .bss concbuf: resb 100 section .text ConCatProc: pushfq ;;;;;;;;;;;;;cld ;all-forward operation mov r9, rcx call LengthOfStringProc mov r10, rax ;length of str 1 mov r11, rdx mov rcx, rdx call LengthOfStringProc mov r12, rax ;length of str 2 lea rdi, [concbuf] ;Make RDI point to the buffer. mov rsi,r9 mov rcx,r10 rep movsb ;copy to buffer. RSI--> RDI mov rsi,r11 mov rcx,r12 rep movsq ;copy to buffer. RSI--> RDI xor al,al stosb ;0-ended the buffer mov rax,concbuf ;return buffer's address popfq ret ;Find length of a C string LengthOfStringProc: push rdi mov rdi, rcx xor rax, rax _strlen_next: cmp byte [rdi], 0 jz _strlen_null_or_done inc rdi inc rax jne _strlen_next _strlen_null_or_done: mov rdi, rcx ; return length of string in RAX pop rdi ret
Nonetheless, I think your implementation of string length is far better optimized than mine.
Anyway, it works now, and thanks again for your help and time!

One thing that I noticed about overflown stack is when I build this code (same code, just ported) in VStudio with MASM64. With the /GS option (canary check) the C++ compiler detected a stack overflow and I had to change the second MOVSQ to MOBSB to rectify this.

I've experimented this in GCC with these options, but none detected any error or thrown any warnings:

Code: [Select]

 -Wformat-overflow=2 -Wformat-truncation=2 -D_FORTIFY_SOURCE=1 -fstack-check -fstack-protector-strong -fstack-protector-explicit

However G++ detected some problemwhen I used the -fstack-protector in the compiler, which then throws this following error:

Code: [Select]

undefined reference to `__stack_chk_fail'
But now the problem is that it throws this error even after I changed second MOVSQ to MOBSB !!

Any ideas please?

Thanks.

movsq is really not necessary and potentially erronous. movsq will transfer 8-bytes at a time and updates RSI/RDI to 8 every time. In a tight heap allocation policy, RDI/RSI from movsq will likely to point to other area, breaking some section/memory barrier, hence "overflow" after addressing the last data, forcing you to use all g++ unnecessary "stack" switches. So you should stick to movsb.

Refer to movs/b/q/d/q here (http://www.felixcloutier.com/x86/MOVS:MOVSB:MOVSW:MOVSD:MOVSQ.html)

Quote

Also, since I have used LEA DI, buffer to load the address instead of MOV, I think I can get rid of CLD

CLD has nothing to do with LEA. Setting the CLD (DF=0) is important in string operations so that RDI/RSI will be incremented towards higher memory address after every string operations (movs, stos, scas, cmps + rep, repne, repe). If, the caller to your function set the string direction backward instead, via STD, then your function will give you funny results due to the strings being saved/read from backwards. So using CLD is a wise precaution measure. stressful's code has lots of educational materials in there. So you should stick to it until you know some more stuff.

NASM - The Netwide Assembler

NASM Forum => Using NASM => Topic started by: prdas31 on March 22, 2018, 05:50:20 AM