NASM - The Netwide Assembler
NASM Forum => Using NASM => Topic started by: prdas31 on March 22, 2018, 05:50:20 AM
-
Request someone to point out the errors in this code please. [g++, nasm64, windows 10, fastcall]
C++ caller:
extern "C" char* ConCatProc(char*, char*);
int main() {
char str1[] = "Test string from C++";
char str2[] = "Test string from NASM64";
char* rax = ConCatProc(str1, str2);
cout << "Concat: " <<rax<< endl;
return 0;
}
The called Nasm routine:
segment .data
concbuf times 2 dq " ",'0'
segment .text
global ConCatProc
ConCatProc:
pushfq
mov [concbuf], rcx
mov [concbuf + 1], rdx
mov rax, [concbuf]
popfq
ret
Appreciate if someone can help correcting the problem.
Thanks.
-
Maybe what you want is a correct buffer setup with large enough size to hold both strings.
section .bss
conBuff: resb 100
Then the next logical steps would be;
0. Make RDI point to the buffer.
1. Find the length of first string.
2. Copy the 1st string to the buffer, according to the length
3. Find the length of second string
4. Copy the 2nd string to the buffer, according to the length
5. 0-ended the buffer
6. Return the address of the buffer in RAX
Need to use lots of string instructions (stosb, scasb, rep, repne etc). I think you should be aware what you are dealing with. Calling C's "strcat" is probably not a bad idea if you're new.
-
Maybe what you want is a correct buffer setup with large enough size to hold both strings.
section .bss
conBuff: resb 100
Then the next logical steps would be;
0. Make RDI point to the buffer.
1. Find the length of first string.
2. Copy the 1st string to the buffer, according to the length
3. Find the length of second string
4. Copy the 2nd string to the buffer, according to the length
5. 0-ended the buffer
6. Return the address of the buffer in RAX
Need to use lots of string instructions (stosb, scasb, rep, repne etc). I think you should be aware what you are dealing with. Calling C's "strcat" is probably not a bad idea if you're new.
Thanks for your reply.
I was also thinking in similar lines and so followed your advice to write this implementation below.
But it still does not work, and returns only the first string.
Could you point out the problems please?
segment .data
segment .bss
concbuf: resb 100
segment .text
global ConCatProc
;Calculate length of string in RCX
LengthOfStringProc:
push rdi
mov rdi, rcx
xor rax, rax
_strlen_next:
cmp byte [rdi], 0
jz _strlen_null_or_done
inc rdi
inc rax
jne _strlen_next
_strlen_null_or_done:
mov rdi, rcx
; return length of string in RAX
pop rdi
ret
;Proc called from C++
ConCatProc:
mov r9, rcx
call LengthOfStringProc
mov r10, rax ;length of str 1
mov r11, rdx
mov rcx, rdx
call LengthOfStringProc
mov r12, rax ;length of str 1
push rdi
lea rdi, [concbuf] ;Make RDI point to the buffer.
;Copy the 1st string to the buffer, according to the length
cld
mov rax, r9
mov rcx, r10
rep stosq
;Copy the 2nd string to the buffer, according to the length
lea rdi, [concbuf + r10+1]
cld
mov rax, r11
mov rcx, r12
rep stosq
;0-ended the buffer
lea rdi, [concbuf + r10+r12+1]
mov rdi, '0'
mov rax, [concbuf] ;should return the concatenated buffer? NOT WORKING, returns only the first string
pop rdi
ret
EDIT: I think I cannot use the QWORD variant of STOS. Have to use the STOSB. Is that correct?
Thanks a lot.
-
You're close. Only that I expected you use MOVSB instead of STOSQ. I suspect you're dealing with UTF characters?
Here a quick code, dealing with ASCII text, that is close enough to your interpretation and my suggested algorithm above...
global ConCatProc
section .bss
conbuf: resb 100
section .text
ConCatProc:
pushfq
cld ;all-forward operation
mov rdi,conbuf ;pointer to buffer
mov r8,rcx ;first string
mov r9,rdx ;second string
mov rsi,r8 ;find string1 length
call str_length
mov rcx,rax
rep movsb ;copy to buffer. RSI--> RDI
mov rsi,r9 ;find string2 length
call str_length
mov rcx,rax
rep movsb ;copy to buffer. RSI--> RDI
xor al,al
stosb ;0-ended the buffer
mov rax,conbuf ;return buffer's address
popfq
ret
;Find length of a C string
str_length:
push rdi
mov rdi,rsi
mov al,0
mov rcx,-1
repne scasb
mov rax,-2
sub rax,rcx
pop rdi
ret
The key here is to not manually modify RDI as the buffer's pointer. Let it increments naturally in MOVSB instructions.
-
You're close. Only that I expected you use MOVSB instead of STOSQ. I suspect you're dealing with UTF characters?
Here a quick code, dealing with ASCII text, that is close enough to your interpretation and my suggested algorithm above...
global ConCatProc
section .bss
conbuf: resb 100
section .text
ConCatProc:
pushfq
cld ;all-forward operation
mov rdi,conbuf ;pointer to buffer
mov r8,rcx ;first string
mov r9,rdx ;second string
mov rsi,r8 ;find string1 length
call str_length
mov rcx,rax
rep movsb ;copy to buffer. RSI--> RDI
mov rsi,r9 ;find string2 length
call str_length
mov rcx,rax
rep movsb ;copy to buffer. RSI--> RDI
xor al,al
stosb ;0-ended the buffer
mov rax,conbuf ;return buffer's address
popfq
ret
;Find length of a C string
str_length:
push rdi
mov rdi,rsi
mov al,0
mov rcx,-1
repne scasb
mov rax,-2
sub rax,rcx
pop rdi
ret
The key here is to not manually modify RDI as the buffer's pointer. Let it increments naturally in MOVSB instructions.
Thanks a lot!
I tried with MOBSB earlier, but somehow could not make it work correctly. :-[
Now thanks to you I've re-adjusted my code accordingly and it works great.
The fact is that the second MOVSB can also be replaced by a MOVSQ, like in the following code.
Also, since I have used LEA DI, buffer to load the address instead of MOV, I think I can get rid of CLD:
global ConCatProc, LengthOfStringProc
section .bss
concbuf: resb 100
section .text
ConCatProc:
pushfq
;;;;;;;;;;;;;cld ;all-forward operation
mov r9, rcx
call LengthOfStringProc
mov r10, rax ;length of str 1
mov r11, rdx
mov rcx, rdx
call LengthOfStringProc
mov r12, rax ;length of str 2
lea rdi, [concbuf] ;Make RDI point to the buffer.
mov rsi,r9
mov rcx,r10
rep movsb ;copy to buffer. RSI--> RDI
mov rsi,r11
mov rcx,r12
rep movsq ;copy to buffer. RSI--> RDI
xor al,al
stosb ;0-ended the buffer
mov rax,concbuf ;return buffer's address
popfq
ret
;Find length of a C string
LengthOfStringProc:
push rdi
mov rdi, rcx
xor rax, rax
_strlen_next:
cmp byte [rdi], 0
jz _strlen_null_or_done
inc rdi
inc rax
jne _strlen_next
_strlen_null_or_done:
mov rdi, rcx
; return length of string in RAX
pop rdi
ret
Nonetheless, I think your implementation of string length is far better optimized than mine.
Anyway, it works now, and thanks again for your help and time!
-
You're close. Only that I expected you use MOVSB instead of STOSQ. I suspect you're dealing with UTF characters?
Here a quick code, dealing with ASCII text, that is close enough to your interpretation and my suggested algorithm above...
global ConCatProc
section .bss
conbuf: resb 100
section .text
ConCatProc:
pushfq
cld ;all-forward operation
mov rdi,conbuf ;pointer to buffer
mov r8,rcx ;first string
mov r9,rdx ;second string
mov rsi,r8 ;find string1 length
call str_length
mov rcx,rax
rep movsb ;copy to buffer. RSI--> RDI
mov rsi,r9 ;find string2 length
call str_length
mov rcx,rax
rep movsb ;copy to buffer. RSI--> RDI
xor al,al
stosb ;0-ended the buffer
mov rax,conbuf ;return buffer's address
popfq
ret
;Find length of a C string
str_length:
push rdi
mov rdi,rsi
mov al,0
mov rcx,-1
repne scasb
mov rax,-2
sub rax,rcx
pop rdi
ret
The key here is to not manually modify RDI as the buffer's pointer. Let it increments naturally in MOVSB instructions.
Thanks a lot!
I tried with MOBSB earlier, but somehow could not make it work correctly. :-[
Now thanks to you I've re-adjusted my code accordingly and it works great.
The fact is that the second MOVSB can also be replaced by a MOVSQ, like in the following code.
Also, since I have used LEA DI, buffer to load the address instead of MOV, I think I can get rid of CLD:
global ConCatProc, LengthOfStringProc
section .bss
concbuf: resb 100
section .text
ConCatProc:
pushfq
;;;;;;;;;;;;;cld ;all-forward operation
mov r9, rcx
call LengthOfStringProc
mov r10, rax ;length of str 1
mov r11, rdx
mov rcx, rdx
call LengthOfStringProc
mov r12, rax ;length of str 2
lea rdi, [concbuf] ;Make RDI point to the buffer.
mov rsi,r9
mov rcx,r10
rep movsb ;copy to buffer. RSI--> RDI
mov rsi,r11
mov rcx,r12
rep movsq ;copy to buffer. RSI--> RDI
xor al,al
stosb ;0-ended the buffer
mov rax,concbuf ;return buffer's address
popfq
ret
;Find length of a C string
LengthOfStringProc:
push rdi
mov rdi, rcx
xor rax, rax
_strlen_next:
cmp byte [rdi], 0
jz _strlen_null_or_done
inc rdi
inc rax
jne _strlen_next
_strlen_null_or_done:
mov rdi, rcx
; return length of string in RAX
pop rdi
ret
Nonetheless, I think your implementation of string length is far better optimized than mine.
Anyway, it works now, and thanks again for your help and time!
One thing that I noticed about overflown stack is when I build this code (same code, just ported) in VStudio with MASM64. With the /GS option (canary check) the C++ compiler detected a stack overflow and I had to change the second MOVSQ to MOBSB to rectify this.
I've experimented this in GCC with these options, but none detected any error or thrown any warnings:
-Wformat-overflow=2 -Wformat-truncation=2 -D_FORTIFY_SOURCE=1 -fstack-check -fstack-protector-strong -fstack-protector-explicit
However G++ detected some problemwhen I used the -fstack-protector in the compiler, which then throws this following error:
undefined reference to `__stack_chk_fail'
But now the problem is that it throws this error even after I changed second MOVSQ to MOBSB !!
Any ideas please?
Thanks.
-
movsq is really not necessary and potentially erronous. movsq will transfer 8-bytes at a time and updates RSI/RDI to 8 every time. In a tight heap allocation policy, RDI/RSI from movsq will likely to point to other area, breaking some section/memory barrier, hence "overflow" after addressing the last data, forcing you to use all g++ unnecessary "stack" switches. So you should stick to movsb.
Refer to movs/b/q/d/q here (http://www.felixcloutier.com/x86/MOVS:MOVSB:MOVSW:MOVSD:MOVSQ.html)
-
Also, since I have used LEA DI, buffer to load the address instead of MOV, I think I can get rid of CLD
CLD has nothing to do with LEA. Setting the CLD (DF=0) is important in string operations so that RDI/RSI will be incremented towards higher memory address after every string operations (movs, stos, scas, cmps + rep, repne, repe). If, the caller to your function set the string direction backward instead, via STD, then your function will give you funny results due to the strings being saved/read from backwards. So using CLD is a wise precaution measure. stressful's code has lots of educational materials in there. So you should stick to it until you know some more stuff.