NASM Forum > Using NASM
Parse string to segments
Aurel:
Hi to all...
Im totaly new in assembler world of coding.
In first place im just a hoby programmer in basic .
my basic compiler (which i use currently) have optinon to add
inline assembly code.
I have written basic like interpreter in this compiler and i want more speed.
So my question is:
How on simple way make parser or in another words extractor.
For example i have string like this:
a$ = "string1 string2 string3"
i want extract this 3 delimited strings to 3 new strings
with nasm code.
Is maby somwhere any example how do this?
Or maby someone of you have idea how do this ?
thanks advance ....
Aurel
Frank Kotler:
Well... we don't really know what "a$" looks like. I seem to recall that BASIC uses a byte prefix with the length (or am I thinking of Pascal?)... And by "extract to a new string", I guess you mean copy the string (to newly allocated memory?). Depending on what you're doing, you may not need to copy the delimited strings - just "finding" them might be enough. But ASSuming that you've got an "lstring" - byte prefix with the length - and want to copy the delimited strings to similar, newly allocated, strings...
--- Code: ---; nasm -f elf32 extract.asm
; ld -o extract extract.o -I/lib/ld-linux.so.2 -lc
global _start
extern malloc
%define DELIMITER ' '
section .text
_start:
bp1: ; just a breakpoint for debugging
mov esi, basicstring
lodsb ; get its length
movzx ebx, al ; transfer it to ebx
add ebx, esi ; end of string (so we know when we're done)
mov edx, pointers ; array of pointers to extracted strings
.top:
; first, figure out how long our delimited string is
xor ecx, ecx
.getlen:
cmp byte [esi + ecx], DELIMITER
jz .gotlen
inc ecx
; if we're at the end of string, we won't find another delimiter, so check!
lea edi, [esi + ecx]
cmp edi, ebx
jnz .getlen
.gotlen:
; then, allocate some memory for it
inc ecx ; we need an extra byte for the length!
push edx ; save our edx - malloc trashes it!
push ecx ; both the parameter to malloc, and "save ecx"
; push ecx - for stdcall (Windows API) push it again!
call malloc ; get some memory for our new string
pop ecx ; restore our length
pop edx ; restore our edx (pointers)
; should check if malloc succeeded - I ASSume it does :(
mov [edx], eax ; save the address we got in "pointers" array
add edx, 4 ; and get ready for next one
mov edi, eax ; make our address "destination" for movsb
dec ecx ; we don't need the "extra" byte anymore
mov al, cl ; save the length byte
stosb
rep movsb ; and copy the string
inc esi ; we left esi pointed at the delimiter - move past it
inc dword [stringcount] ; count our delimited strings
cmp esi, ebx ; are we done?
jb .top ; no? do more.
; we're finished - print 'em, just to prove it worked :)
; this part is specific to Linux.
mov esi, pointers
print_next:
mov ecx, [esi] ; address of our delimited string
add esi, 4 ; get ready for next one
movzx edx, byte [ecx] ; Linux wants the length in edx
inc ecx ; move past the length byte
mov ebx, 1 ; STDOUT
mov eax, 4 ; __NR_write
int 80h ; call kernel
mov ecx, newline
mov edx, 1
mov ebx, 1
mov eax, 4
int 80h
dec dword [stringcount]
jnz print_next
exit:
mov eax, 1
int 80h
;-----------------------
section .data
basicstring db .end - basicstring - 1
db "string1 string2 string3"
.end:
newline db 10
;------------------
section .bss
pointers resd 128
stringcount resd 1
;----------------------
--- End code ---
That probably isn't what you want - not in Linux, anyway - but maybe it'll give you an idea how to approach it.
Best,
Frank
Aurel:
First of all thank you very much Frank i now understand little bit better
how things work.
It's about basic compiler called EBasic which have NASM.
This compiler is only for Windows.
One guy help me to and here is his code:
Basic code:
--- Code: ---DECLARE Split(Inp$:STRING,Deliminator:CHAR,RetArray:POINTER),INT
CONST MaxSplit = 17
DEF A$,src:STRING
def I:INT
'def w:pointer
DEF StrPArray[MaxSplit]:INT
A$ = "This string will be"
OPENCONSOLE
'PRINT A$
src=A$:A$=""
W = Split(src," ",StrPArray)
PRINT "Number of strings:",str$(W)
print
FOR I = 0 TO W-1
IF I = 0 then PRINT *<STRING>(StrPArray[I])
IF I = 1 THEN PRINT *<STRING>(StrPArray[I])
IF I = 2 THEN PRINT *<STRING>(StrPArray[I])
IF I = 3 THEN PRINT *<STRING>(StrPArray[I])
IF I = 4 THEN PRINT *<STRING>(StrPArray[I])
IF I = 5 THEN PRINT *<STRING>(StrPArray[I])
IF I = 6 THEN PRINT *<STRING>(StrPArray[I])
IF I = 7 THEN PRINT *<STRING>(StrPArray[I])
IF I = 8 THEN PRINT *<STRING>(StrPArray[I])
IF I = 9 THEN PRINT *<STRING>(StrPArray[I])
IF I = 10 THEN PRINT *<STRING>(StrPArray[I])
IF I = 11 THEN PRINT *<STRING>(StrPArray[I])
IF I = 12 THEN PRINT *<STRING>(StrPArray[I])
IF I = 13 THEN PRINT *<STRING>(StrPArray[I])
IF I = 14 then PRINT *<STRING>(StrPArray[I])
IF I = 15 THEN PRINT *<STRING>(StrPArray[I])
IF I = 16 THEN PRINT *<STRING>(StrPArray[I])
NEXT I
DO
UNTIL INKEY$<>""
END
--- End code ---
and here is assembler code:
--- Code: ---_asm
Split: push ebp
mov ebp, esp
push esi
push edi
push ebx
mov edi, [ebp+8]
mov esi, [ebp+16]
xor ecx, ecx
xor ebx, ebx
movzx eax, byte [ebp+12]
C01:mov [esi], edi
inc ebx
C00:cmp byte [edi], 0
jz Exit
inc ecx
scasb
jnz C00
lea esi, [esi+4]
mov [edi-1], ah
jmp C01
Exit: mov dword [esi+4], 0 ;Can be omited since we have return value.
xchg eax, ebx
pop ebx
pop edi
pop esi
leave
ret 0x0C
_endasm
--- End code ---
And works but original string is destroyed but it's not important.
String is piece of text inside quotes and each new string is puted in
array element as you can see.
Your explanation how assembler work it's great and first time see
explanation like this and understand much better.
Of course i will try with your code to.
thanks again...
Aurel
munair:
--- Quote from: Frank Kotler on September 26, 2010, 03:09:56 PM ---Well... we don't really know what "a$" looks like. I seem to recall that BASIC uses a byte prefix with the length (or am I thinking of Pascal?)... And by "extract to a new string", I guess you mean copy the string (to newly allocated memory?). Depending on what you're doing, you may not need to copy the delimited strings - just "finding" them might be enough. But ASSuming that you've got an "lstring" - byte prefix with the length - and want to copy the delimited strings to similar, newly allocated, strings...
--- Code: ---; nasm -f elf32 extract.asm
; ld -o extract extract.o -I/lib/ld-linux.so.2 -lc
global _start
extern malloc
%define DELIMITER ' '
section .text
_start:
bp1: ; just a breakpoint for debugging
mov esi, basicstring
lodsb ; get its length
movzx ebx, al ; transfer it to ebx
add ebx, esi ; end of string (so we know when we're done)
mov edx, pointers ; array of pointers to extracted strings
.top:
; first, figure out how long our delimited string is
xor ecx, ecx
.getlen:
cmp byte [esi + ecx], DELIMITER
jz .gotlen
inc ecx
; if we're at the end of string, we won't find another delimiter, so check!
lea edi, [esi + ecx]
cmp edi, ebx
jnz .getlen
.gotlen:
; then, allocate some memory for it
inc ecx ; we need an extra byte for the length!
push edx ; save our edx - malloc trashes it!
push ecx ; both the parameter to malloc, and "save ecx"
; push ecx - for stdcall (Windows API) push it again!
call malloc ; get some memory for our new string
pop ecx ; restore our length
pop edx ; restore our edx (pointers)
; should check if malloc succeeded - I ASSume it does :(
mov [edx], eax ; save the address we got in "pointers" array
add edx, 4 ; and get ready for next one
mov edi, eax ; make our address "destination" for movsb
dec ecx ; we don't need the "extra" byte anymore
mov al, cl ; save the length byte
stosb
rep movsb ; and copy the string
inc esi ; we left esi pointed at the delimiter - move past it
inc dword [stringcount] ; count our delimited strings
cmp esi, ebx ; are we done?
jb .top ; no? do more.
; we're finished - print 'em, just to prove it worked :)
; this part is specific to Linux.
mov esi, pointers
print_next:
mov ecx, [esi] ; address of our delimited string
add esi, 4 ; get ready for next one
movzx edx, byte [ecx] ; Linux wants the length in edx
inc ecx ; move past the length byte
mov ebx, 1 ; STDOUT
mov eax, 4 ; __NR_write
int 80h ; call kernel
mov ecx, newline
mov edx, 1
mov ebx, 1
mov eax, 4
int 80h
dec dword [stringcount]
jnz print_next
exit:
mov eax, 1
int 80h
;-----------------------
section .data
basicstring db .end - basicstring - 1
db "string1 string2 string3"
.end:
newline db 10
;------------------
section .bss
pointers resd 128
stringcount resd 1
;----------------------
--- End code ---
That probably isn't what you want - not in Linux, anyway - but maybe it'll give you an idea how to approach it.
Best,
Frank
--- End quote ---
When linking this I get:
[frank@frank-pc xmpl]$ ld -o extract extract.o -I/lib/ld-linux.so.2 -lc
ld: i386 architecture of input file `extract.o' is incompatible with i386:x86-64 output
and:
[frank@frank-pc xmpl]$ ld -m elf_i386 -o extract extract.o -I/lib/ld-linux.so.2 -lc
ld: skipping incompatible /usr/lib/libc.so when searching for -lc
ld: skipping incompatible /usr/lib/libc.a when searching for -lc
ld: cannot find -lc
ld: skipping incompatible /usr/lib/libc.so when searching for -lc
Frank Kotler:
Dunno.
--- Code: ---apt-get install gcc miltilib
--- End code ---
perhaps?
Or do it in 64 bits?
Best,
Frank
Navigation
[0] Message Index
[#] Next page
Go to full version