NASM - The Netwide Assembler
NASM Forum => Using NASM => Topic started by: Aurel on September 25, 2010, 03:43:28 PM
-
Hi to all...
Im totaly new in assembler world of coding.
In first place im just a hoby programmer in basic .
my basic compiler (which i use currently) have optinon to add
inline assembly code.
I have written basic like interpreter in this compiler and i want more speed.
So my question is:
How on simple way make parser or in another words extractor.
For example i have string like this:
a$ = "string1 string2 string3"
i want extract this 3 delimited strings to 3 new strings
with nasm code.
Is maby somwhere any example how do this?
Or maby someone of you have idea how do this ?
thanks advance ....
Aurel
-
Well... we don't really know what "a$" looks like. I seem to recall that BASIC uses a byte prefix with the length (or am I thinking of Pascal?)... And by "extract to a new string", I guess you mean copy the string (to newly allocated memory?). Depending on what you're doing, you may not need to copy the delimited strings - just "finding" them might be enough. But ASSuming that you've got an "lstring" - byte prefix with the length - and want to copy the delimited strings to similar, newly allocated, strings...
; nasm -f elf32 extract.asm
; ld -o extract extract.o -I/lib/ld-linux.so.2 -lc
global _start
extern malloc
%define DELIMITER ' '
section .text
_start:
bp1: ; just a breakpoint for debugging
mov esi, basicstring
lodsb ; get its length
movzx ebx, al ; transfer it to ebx
add ebx, esi ; end of string (so we know when we're done)
mov edx, pointers ; array of pointers to extracted strings
.top:
; first, figure out how long our delimited string is
xor ecx, ecx
.getlen:
cmp byte [esi + ecx], DELIMITER
jz .gotlen
inc ecx
; if we're at the end of string, we won't find another delimiter, so check!
lea edi, [esi + ecx]
cmp edi, ebx
jnz .getlen
.gotlen:
; then, allocate some memory for it
inc ecx ; we need an extra byte for the length!
push edx ; save our edx - malloc trashes it!
push ecx ; both the parameter to malloc, and "save ecx"
; push ecx - for stdcall (Windows API) push it again!
call malloc ; get some memory for our new string
pop ecx ; restore our length
pop edx ; restore our edx (pointers)
; should check if malloc succeeded - I ASSume it does :(
mov [edx], eax ; save the address we got in "pointers" array
add edx, 4 ; and get ready for next one
mov edi, eax ; make our address "destination" for movsb
dec ecx ; we don't need the "extra" byte anymore
mov al, cl ; save the length byte
stosb
rep movsb ; and copy the string
inc esi ; we left esi pointed at the delimiter - move past it
inc dword [stringcount] ; count our delimited strings
cmp esi, ebx ; are we done?
jb .top ; no? do more.
; we're finished - print 'em, just to prove it worked :)
; this part is specific to Linux.
mov esi, pointers
print_next:
mov ecx, [esi] ; address of our delimited string
add esi, 4 ; get ready for next one
movzx edx, byte [ecx] ; Linux wants the length in edx
inc ecx ; move past the length byte
mov ebx, 1 ; STDOUT
mov eax, 4 ; __NR_write
int 80h ; call kernel
mov ecx, newline
mov edx, 1
mov ebx, 1
mov eax, 4
int 80h
dec dword [stringcount]
jnz print_next
exit:
mov eax, 1
int 80h
;-----------------------
section .data
basicstring db .end - basicstring - 1
db "string1 string2 string3"
.end:
newline db 10
;------------------
section .bss
pointers resd 128
stringcount resd 1
;----------------------
That probably isn't what you want - not in Linux, anyway - but maybe it'll give you an idea how to approach it.
Best,
Frank
-
First of all thank you very much Frank i now understand little bit better
how things work.
It's about basic compiler called EBasic which have NASM.
This compiler is only for Windows.
One guy help me to and here is his code:
Basic code:
DECLARE Split(Inp$:STRING,Deliminator:CHAR,RetArray:POINTER),INT
CONST MaxSplit = 17
DEF A$,src:STRING
def I:INT
'def w:pointer
DEF StrPArray[MaxSplit]:INT
A$ = "This string will be"
OPENCONSOLE
'PRINT A$
src=A$:A$=""
W = Split(src," ",StrPArray)
PRINT "Number of strings:",str$(W)
print
FOR I = 0 TO W-1
IF I = 0 then PRINT *<STRING>(StrPArray[I])
IF I = 1 THEN PRINT *<STRING>(StrPArray[I])
IF I = 2 THEN PRINT *<STRING>(StrPArray[I])
IF I = 3 THEN PRINT *<STRING>(StrPArray[I])
IF I = 4 THEN PRINT *<STRING>(StrPArray[I])
IF I = 5 THEN PRINT *<STRING>(StrPArray[I])
IF I = 6 THEN PRINT *<STRING>(StrPArray[I])
IF I = 7 THEN PRINT *<STRING>(StrPArray[I])
IF I = 8 THEN PRINT *<STRING>(StrPArray[I])
IF I = 9 THEN PRINT *<STRING>(StrPArray[I])
IF I = 10 THEN PRINT *<STRING>(StrPArray[I])
IF I = 11 THEN PRINT *<STRING>(StrPArray[I])
IF I = 12 THEN PRINT *<STRING>(StrPArray[I])
IF I = 13 THEN PRINT *<STRING>(StrPArray[I])
IF I = 14 then PRINT *<STRING>(StrPArray[I])
IF I = 15 THEN PRINT *<STRING>(StrPArray[I])
IF I = 16 THEN PRINT *<STRING>(StrPArray[I])
NEXT I
DO
UNTIL INKEY$<>""
END
and here is assembler code:
_asm
Split: push ebp
mov ebp, esp
push esi
push edi
push ebx
mov edi, [ebp+8]
mov esi, [ebp+16]
xor ecx, ecx
xor ebx, ebx
movzx eax, byte [ebp+12]
C01:mov [esi], edi
inc ebx
C00:cmp byte [edi], 0
jz Exit
inc ecx
scasb
jnz C00
lea esi, [esi+4]
mov [edi-1], ah
jmp C01
Exit: mov dword [esi+4], 0 ;Can be omited since we have return value.
xchg eax, ebx
pop ebx
pop edi
pop esi
leave
ret 0x0C
_endasm
And works but original string is destroyed but it's not important.
String is piece of text inside quotes and each new string is puted in
array element as you can see.
Your explanation how assembler work it's great and first time see
explanation like this and understand much better.
Of course i will try with your code to.
thanks again...
Aurel
-
Well... we don't really know what "a$" looks like. I seem to recall that BASIC uses a byte prefix with the length (or am I thinking of Pascal?)... And by "extract to a new string", I guess you mean copy the string (to newly allocated memory?). Depending on what you're doing, you may not need to copy the delimited strings - just "finding" them might be enough. But ASSuming that you've got an "lstring" - byte prefix with the length - and want to copy the delimited strings to similar, newly allocated, strings...
; nasm -f elf32 extract.asm
; ld -o extract extract.o -I/lib/ld-linux.so.2 -lc
global _start
extern malloc
%define DELIMITER ' '
section .text
_start:
bp1: ; just a breakpoint for debugging
mov esi, basicstring
lodsb ; get its length
movzx ebx, al ; transfer it to ebx
add ebx, esi ; end of string (so we know when we're done)
mov edx, pointers ; array of pointers to extracted strings
.top:
; first, figure out how long our delimited string is
xor ecx, ecx
.getlen:
cmp byte [esi + ecx], DELIMITER
jz .gotlen
inc ecx
; if we're at the end of string, we won't find another delimiter, so check!
lea edi, [esi + ecx]
cmp edi, ebx
jnz .getlen
.gotlen:
; then, allocate some memory for it
inc ecx ; we need an extra byte for the length!
push edx ; save our edx - malloc trashes it!
push ecx ; both the parameter to malloc, and "save ecx"
; push ecx - for stdcall (Windows API) push it again!
call malloc ; get some memory for our new string
pop ecx ; restore our length
pop edx ; restore our edx (pointers)
; should check if malloc succeeded - I ASSume it does :(
mov [edx], eax ; save the address we got in "pointers" array
add edx, 4 ; and get ready for next one
mov edi, eax ; make our address "destination" for movsb
dec ecx ; we don't need the "extra" byte anymore
mov al, cl ; save the length byte
stosb
rep movsb ; and copy the string
inc esi ; we left esi pointed at the delimiter - move past it
inc dword [stringcount] ; count our delimited strings
cmp esi, ebx ; are we done?
jb .top ; no? do more.
; we're finished - print 'em, just to prove it worked :)
; this part is specific to Linux.
mov esi, pointers
print_next:
mov ecx, [esi] ; address of our delimited string
add esi, 4 ; get ready for next one
movzx edx, byte [ecx] ; Linux wants the length in edx
inc ecx ; move past the length byte
mov ebx, 1 ; STDOUT
mov eax, 4 ; __NR_write
int 80h ; call kernel
mov ecx, newline
mov edx, 1
mov ebx, 1
mov eax, 4
int 80h
dec dword [stringcount]
jnz print_next
exit:
mov eax, 1
int 80h
;-----------------------
section .data
basicstring db .end - basicstring - 1
db "string1 string2 string3"
.end:
newline db 10
;------------------
section .bss
pointers resd 128
stringcount resd 1
;----------------------
That probably isn't what you want - not in Linux, anyway - but maybe it'll give you an idea how to approach it.
Best,
Frank
When linking this I get:
[frank@frank-pc xmpl]$ ld -o extract extract.o -I/lib/ld-linux.so.2 -lc
ld: i386 architecture of input file `extract.o' is incompatible with i386:x86-64 output
and:
[frank@frank-pc xmpl]$ ld -m elf_i386 -o extract extract.o -I/lib/ld-linux.so.2 -lc
ld: skipping incompatible /usr/lib/libc.so when searching for -lc
ld: skipping incompatible /usr/lib/libc.a when searching for -lc
ld: cannot find -lc
ld: skipping incompatible /usr/lib/libc.so when searching for -lc
-
Dunno.
apt-get install gcc miltilib
perhaps?
Or do it in 64 bits?
Best,
Frank
-
What calling convention is used by your basic compiler? (Which compiler?)
The use of the stack to pass arguments to a function/procedure in x86-64 mode isn't usual...
-
I should have stopped after "Dunno".
Code I posted was 32 bit, probably assembled and linked on a 32 bit system. I have bo ,emory of where those parameters came from.
Where are we now? Dunno.
Probably best to start over: What OS and what do you need to do?
Best,
Frank
-
I compile 32bits code with NASM and LD on Manjaro Linux x64. The code for the SharpBASIC 32bits compiler I work on (not to confuse with the BASIC referred to by the OP) compiles and runs fine. But Frank's example doesn't. I'm sure it's because of ld-linux.so.2, which I don't use for my code.
-
I compile 32bits code with NASM and LD on Manjaro Linux x64. The code for the SharpBASIC 32bits compiler I work on (not to confuse with the BASIC referred to by the OP) compiles and runs fine. But Frank's example doesn't. I'm sure it's because of ld-linux.so.2, which I don't use for my code.
The question is there because of this error:
[frank@frank-pc xmpl]$ ld -o extract extract.o -I/lib/ld-linux.so.2 -lc
ld: i386 architecture of input file `extract.o' is incompatible with i386:x86-64 output
You are trying to link an ELF32 object file to an ELF64 one. So, what is it? 64 or 32 bits?
-
I compile 32bits code with NASM and LD on Manjaro Linux x64. The code for the SharpBASIC 32bits compiler I work on (not to confuse with the BASIC referred to by the OP) compiles and runs fine. But Frank's example doesn't. I'm sure it's because of ld-linux.so.2, which I don't use for my code.
The question is there because of this error:
[frank@frank-pc xmpl]$ ld -o extract extract.o -I/lib/ld-linux.so.2 -lc
ld: i386 architecture of input file `extract.o' is incompatible with i386:x86-64 output
You are trying to link an ELF32 object file to an ELF64 one. So, what is it? 64 or 32 bits?
The only object file is extract.o, which is 32 bits. I finally got it linked with:
ld -o extract extract.o -m elf_i386 -L /usr/lib32 -lc -dynamic-linker /lib/ld-linux.so.2
Without "-dynamic-linker" links too, but the resulting executable doesn't run. Although it's there, I get "no such file or directory".
(https://sharpbasic.com/images/extract.png)
-
I'm happy that 'extract' works. I will use it as example for string manipulation routines in the SharpBASIC compiler.