NASM Forum > Using NASM

Parse string to segments

(1/3) > >>

Aurel:
Hi to all...
Im totaly new in assembler world of coding.
In first place im just a hoby programmer in basic .
my basic compiler (which i use currently) have optinon to add
inline assembly code.
I have written basic like interpreter in this compiler and i want more speed.
So my question is:
How on simple way make parser or in another words extractor.
For example i have string like this:
a$ = "string1 string2 string3"
i want extract this 3 delimited strings to 3 new strings
with nasm code.
Is maby somwhere any example how do this?
Or maby someone of you have idea how do this ?

thanks advance ....
Aurel

Frank Kotler:
Well... we don't really know what "a$" looks like. I seem to recall that BASIC uses a byte prefix with the length (or am I thinking of Pascal?)... And by "extract to a new string", I guess you mean copy the string (to newly allocated memory?). Depending on what you're doing, you may not need to copy the delimited strings - just "finding" them might be enough. But ASSuming that you've got an "lstring" - byte prefix with the length - and want to copy the delimited strings to similar, newly allocated, strings...


--- Code: ---; nasm -f elf32 extract.asm
; ld -o extract extract.o -I/lib/ld-linux.so.2 -lc


global _start
extern malloc

%define DELIMITER ' '

section .text
_start:
bp1: ; just a breakpoint for debugging

    mov esi, basicstring
    lodsb ; get its length
    movzx ebx, al ; transfer it to ebx
    add ebx, esi ; end of string (so we know when we're done)

    mov edx, pointers ; array of pointers to extracted strings
   
.top:

; first, figure out how long our delimited string is
    xor ecx, ecx
.getlen:
    cmp byte [esi + ecx], DELIMITER
    jz .gotlen
    inc ecx
; if we're at the end of string, we won't find another delimiter, so check!
    lea edi, [esi + ecx]
    cmp edi, ebx
    jnz .getlen
.gotlen:

; then, allocate some memory for it
    inc ecx ; we need an extra byte for the length!

    push edx ; save our edx - malloc trashes it!
    push ecx ; both the parameter to malloc, and "save ecx"
    ; push ecx - for stdcall (Windows API) push it again!
    call malloc ; get some memory for our new string
    pop ecx ; restore our length
    pop edx ; restore our edx (pointers)

    ; should check if malloc succeeded - I ASSume it does :(
    mov [edx], eax ; save the address we got in "pointers" array
    add edx, 4 ; and get ready for next one
   
    mov edi, eax ; make our address "destination" for movsb
    dec ecx ; we don't need the "extra" byte anymore

    mov al, cl ; save the length byte
    stosb
    rep movsb ; and copy the string

    inc esi ; we left esi pointed at the delimiter - move past it
    inc dword [stringcount] ; count our delimited strings
    cmp esi, ebx ; are we done?
    jb .top ; no? do more.

; we're finished - print 'em, just to prove it worked :)
; this part is specific to Linux.

    mov esi, pointers
print_next:
    mov ecx, [esi] ; address of our delimited string
    add esi, 4 ; get ready for next one
    movzx edx, byte [ecx] ; Linux wants the length in edx
    inc ecx ; move past the length byte
    mov ebx, 1 ; STDOUT
    mov eax, 4 ; __NR_write
    int 80h ; call kernel

    mov ecx, newline
    mov edx, 1
    mov ebx, 1
    mov eax, 4
    int 80h

    dec dword [stringcount]
    jnz print_next

exit:
    mov eax, 1
    int 80h
;-----------------------

section .data

basicstring db .end - basicstring - 1
    db "string1 string2 string3"
.end:

newline db 10

;------------------
section .bss
    pointers resd 128
    stringcount resd 1
;----------------------   


--- End code ---

That probably isn't what you want - not in Linux, anyway - but maybe it'll give you an idea how to approach it.

Best,
Frank

Aurel:
First of all thank you very much Frank i now understand little bit better
how things work.
It's about basic compiler called EBasic which have NASM.
This compiler is only for Windows.
One guy help me to and here is his code:
Basic code:

--- Code: ---DECLARE Split(Inp$:STRING,Deliminator:CHAR,RetArray:POINTER),INT
CONST MaxSplit = 17
DEF A$,src:STRING
def I:INT
'def w:pointer
DEF StrPArray[MaxSplit]:INT
A$ = "This string will be"
OPENCONSOLE
'PRINT A$
src=A$:A$=""
W = Split(src," ",StrPArray)
PRINT "Number of strings:",str$(W)
print

FOR I = 0 TO W-1
IF I = 0 then PRINT *<STRING>(StrPArray[I])
IF I = 1 THEN PRINT *<STRING>(StrPArray[I])
IF I = 2 THEN PRINT *<STRING>(StrPArray[I])
IF I = 3 THEN PRINT *<STRING>(StrPArray[I])
IF I = 4 THEN PRINT *<STRING>(StrPArray[I])
IF I = 5 THEN PRINT *<STRING>(StrPArray[I])
IF I = 6 THEN PRINT *<STRING>(StrPArray[I])
IF I = 7 THEN PRINT *<STRING>(StrPArray[I])
IF I = 8 THEN PRINT *<STRING>(StrPArray[I])
IF I = 9 THEN PRINT *<STRING>(StrPArray[I])
IF I = 10 THEN PRINT *<STRING>(StrPArray[I])
IF I = 11 THEN PRINT *<STRING>(StrPArray[I])
IF I = 12 THEN PRINT *<STRING>(StrPArray[I])
IF I = 13 THEN PRINT *<STRING>(StrPArray[I])
IF I = 14 then PRINT *<STRING>(StrPArray[I])
IF I = 15 THEN PRINT *<STRING>(StrPArray[I])
IF I = 16 THEN PRINT *<STRING>(StrPArray[I])
NEXT I

DO
UNTIL INKEY$<>""
END

--- End code ---

and here is assembler code:

--- Code: ---_asm
Split: push ebp
mov ebp, esp
push esi
push edi
push ebx
mov edi, [ebp+8]
mov esi, [ebp+16]
xor ecx, ecx
xor ebx, ebx
movzx eax, byte [ebp+12]
C01:mov [esi], edi
inc ebx
C00:cmp byte [edi], 0
        jz Exit
inc ecx
scasb
jnz C00
lea esi, [esi+4]
mov [edi-1], ah
jmp C01
Exit: mov dword [esi+4], 0          ;Can be omited since we have return value.
xchg eax, ebx
pop ebx
pop edi
pop esi
leave
ret 0x0C
_endasm

--- End code ---

And works but original string is destroyed but it's not important.
String is piece of text inside quotes and each new string is puted in
array element as you can see.

Your explanation how assembler work it's great and first time see
explanation like this and understand much better.
Of course i will try with your code to.
thanks again...

Aurel

 

munair:

--- Quote from: Frank Kotler on September 26, 2010, 03:09:56 PM ---Well... we don't really know what "a$" looks like. I seem to recall that BASIC uses a byte prefix with the length (or am I thinking of Pascal?)... And by "extract to a new string", I guess you mean copy the string (to newly allocated memory?). Depending on what you're doing, you may not need to copy the delimited strings - just "finding" them might be enough. But ASSuming that you've got an "lstring" - byte prefix with the length - and want to copy the delimited strings to similar, newly allocated, strings...


--- Code: ---; nasm -f elf32 extract.asm
; ld -o extract extract.o -I/lib/ld-linux.so.2 -lc


global _start
extern malloc

%define DELIMITER ' '

section .text
_start:
bp1: ; just a breakpoint for debugging

    mov esi, basicstring
    lodsb ; get its length
    movzx ebx, al ; transfer it to ebx
    add ebx, esi ; end of string (so we know when we're done)

    mov edx, pointers ; array of pointers to extracted strings
   
.top:

; first, figure out how long our delimited string is
    xor ecx, ecx
.getlen:
    cmp byte [esi + ecx], DELIMITER
    jz .gotlen
    inc ecx
; if we're at the end of string, we won't find another delimiter, so check!
    lea edi, [esi + ecx]
    cmp edi, ebx
    jnz .getlen
.gotlen:

; then, allocate some memory for it
    inc ecx ; we need an extra byte for the length!

    push edx ; save our edx - malloc trashes it!
    push ecx ; both the parameter to malloc, and "save ecx"
    ; push ecx - for stdcall (Windows API) push it again!
    call malloc ; get some memory for our new string
    pop ecx ; restore our length
    pop edx ; restore our edx (pointers)

    ; should check if malloc succeeded - I ASSume it does :(
    mov [edx], eax ; save the address we got in "pointers" array
    add edx, 4 ; and get ready for next one
   
    mov edi, eax ; make our address "destination" for movsb
    dec ecx ; we don't need the "extra" byte anymore

    mov al, cl ; save the length byte
    stosb
    rep movsb ; and copy the string

    inc esi ; we left esi pointed at the delimiter - move past it
    inc dword [stringcount] ; count our delimited strings
    cmp esi, ebx ; are we done?
    jb .top ; no? do more.

; we're finished - print 'em, just to prove it worked :)
; this part is specific to Linux.

    mov esi, pointers
print_next:
    mov ecx, [esi] ; address of our delimited string
    add esi, 4 ; get ready for next one
    movzx edx, byte [ecx] ; Linux wants the length in edx
    inc ecx ; move past the length byte
    mov ebx, 1 ; STDOUT
    mov eax, 4 ; __NR_write
    int 80h ; call kernel

    mov ecx, newline
    mov edx, 1
    mov ebx, 1
    mov eax, 4
    int 80h

    dec dword [stringcount]
    jnz print_next

exit:
    mov eax, 1
    int 80h
;-----------------------

section .data

basicstring db .end - basicstring - 1
    db "string1 string2 string3"
.end:

newline db 10

;------------------
section .bss
    pointers resd 128
    stringcount resd 1
;----------------------   


--- End code ---

That probably isn't what you want - not in Linux, anyway - but maybe it'll give you an idea how to approach it.

Best,
Frank

--- End quote ---
When linking this I get:

[frank@frank-pc xmpl]$ ld -o extract extract.o -I/lib/ld-linux.so.2 -lc
ld: i386 architecture of input file `extract.o' is incompatible with i386:x86-64 output

and:

[frank@frank-pc xmpl]$ ld -m elf_i386 -o extract extract.o -I/lib/ld-linux.so.2 -lc
ld: skipping incompatible /usr/lib/libc.so when searching for -lc
ld: skipping incompatible /usr/lib/libc.a when searching for -lc
ld: cannot find -lc
ld: skipping incompatible /usr/lib/libc.so when searching for -lc

Frank Kotler:
Dunno.

--- Code: ---apt-get install gcc miltilib

--- End code ---
perhaps?

Or do it in 64 bits?

Best,
Frank

Navigation

[0] Message Index

[#] Next page

Go to full version