Author Topic: Help with writing custom C type string functions using NASM  (Read 28559 times)

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #30 on: September 08, 2017, 06:17:52 PM »
Well, "puts" is just a sys_write to stdout. Yhe only "different" thing about it is the null-terminated string. You can call your I_strlen and move the length to edx or do:
Code: [Select]
cmp [ecx + edx], byte 0
; etc.

The rest of 'em are just wrappers around system calls. They'll look a lot like your l_gets. If error, eax will be -ERRNO - you need to change it to -1. The real C library does this and puts the error number in the global variable "errno". You apparently don't need to do that. Probably were supposed to do that for l_gets, too.

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #31 on: September 08, 2017, 09:45:49 PM »
Does this look good for l_puts:

Code: [Select]
bits 32

section .text

global l_puts

l_puts:
        push ebp                ; prologue
        mov ebp, esp   

        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; const char *buf goes into ebx

        .char_loop:
                cmp [ebx], byte 0x0     ; look for null terminator
                je .done

                mov eax, 4              ; sys write
                int 0x80               

                jmp .char_loop


        .done:
                pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret

I'm thinking I would need to write each byte, kind of like l_gets reads in each byte. The sys_write takes three args: int, const char*, size_t. Would I need to set up the stack in the .char_loop so that ebx, ecx, and edx hold each of these arguments? I'm a little confused here because the actual call to l_puts only takes in one argument (or are these called parameters?), so not sure how to set this up.
« Last Edit: September 08, 2017, 09:59:30 PM by turtle13 »

Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #32 on: September 08, 2017, 10:22:26 PM »
Here is what I have so far for l_write:

Code: [Select]
bits 32

section .text

global l_write

l_write:
        push ebp                ;prologue
        mov ebp, esp

        push ebx

        mov ebx, [ebp + 8]      ; fd stored in ebx
        mov ecx, [ebp + 12]     ; char *buf stored in ecx
        mov edx, [ebp + 16]     ; len stored in edx

        xor esi, esi            ; counter

        cmp edx, 0              ; check for error
        jle .error

        .char_loop:
                mov eax, 4              ; sys write
                int 0x80

                inc ecx
                inc esi

                cmp esi, [ebp + 16]     ; does bytes written = len?
                je .done

                jmp .char_loop



        .error:
                mov eax, -1             ; error
                pop ebx                 ; restore ebx
               
                mov esp, ebp            ; epilogue
                pop ebp
                ret

        .done:
                mov eax, esi            ; return # bytes written
                pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret


Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #33 on: September 08, 2017, 10:26:54 PM »
Well, no. The one and only parameter (you can call it an argument), the address of the string, goes in ecx. We know that the file descriptor wants to be stdout (1) - that goes in ebx. The length, which you need to find either by calling l_strlen or by finding the zero here, goes in edx. You might want to use a loop like this:
Code: [Select]
; address is in ecx
xor edx, edx
.find:
cmp [ecx + edx], byte 0
jnz .found
inc edx
jmp .find
Or call l_strlen...

Best,
Frank


Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #34 on: September 08, 2017, 10:41:33 PM »
Code: [Select]
bits 32

section .text

global l_write

l_write:
        push ebp                ;prologue
        mov ebp, esp

        push ebx

        mov ebx, [ebp + 8]      ; fd stored in ebx
        mov ecx, [ebp + 12]     ; char *buf stored in ecx
        mov edx, [ebp + 16]     ; len stored in edx

        xor esi, esi            ; counter
; you don't need a counter, eax does it
; if you do use esi, you need to push/pop it with ebx

        cmp edx, 0              ; check for error
; and eax will be negative if error, not edx
; and you need to do this after sys_write, not before

        jle .error

        .char_loop:
                mov eax, 4              ; sys write
                int 0x80

                inc ecx
                inc esi

                cmp esi, [ebp + 16]     ; does bytes written = len?
                je .done
; eax will be edx... even if some of 'em are garbage
; unless there's an error

                jmp .char_loop



        .error:
                mov eax, -1             ; error
                pop ebx                 ; restore ebx
               
                mov esp, ebp            ; epilogue
                pop ebp
                ret

        .done:
                mov eax, esi            ; return # bytes written
                pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #35 on: September 08, 2017, 11:03:25 PM »
Code: [Select]
; address is in ecx
xor edx, edx
.find:
cmp [ecx + edx], byte 0
jnz .found
inc edx
jmp .find

Did you mean that to say "je .found" instead of "jnz .found" ?

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #36 on: September 08, 2017, 11:19:42 PM »
Sure enough. My bad.

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #37 on: September 09, 2017, 06:53:20 PM »
OK so I got a lot going through my head now, back to the l_gets,

can you take a look at the code and let me know where I should go from here?

Code: [Select]
l_gets:
        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter into ebx
        mov ecx, [ebp + 12]     ; char *buf in ecx
        mov edx, [ebp + 16]     ; len in edx
        xor esi, esi            ; counter

        cmp edx, 0              ; if len zero or less, exit
        jle .done

        .char_loop:       
       
                mov eax, 3              ; sys read
                int 0x80

                cmp [ecx], byte 0xA     ; test for linefeed
                je .done

                inc ecx                 ; advance to next byte
                inc esi                 ; +count

                cmp esi, [ebp + 16]     ; does read bytes = len?
                je .done

                jmp .char_loop

               
       
        .done:
                mov eax, esi            ; # bytes read into eax               
                pop ebx                 ; restore ebx
                mov esp, ebp            ; epilogue
                pop ebp
                ret

Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #38 on: September 09, 2017, 07:57:08 PM »
Frank can you provide a little insight as to why

Code: [Select]
cmp [ecx + edx], byte 0

is used to find a null terminator for "l_puts"

ecx= the pointer to the address of the string, correct? void l_puts(const char *buf)

if edx starts at 0, is the first iteration checking the first character in the string for 0?
Then if not zero, edx increments to 1, does that mean that the second byte in the string (or "character array") is then checked for zero? It's not making sense to me how this is working and seems like we have to take a lot on faith in this programming stuff.

Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #39 on: September 09, 2017, 08:13:09 PM »
code for l_puts:

Code: [Select]
bits 32

section .text

global l_puts

l_puts:
        push ebp                ; prologue
        mov ebp, esp   

        push ebx                ; preserve ebx
        push edi                ; preserve edi
        push esi                ; preserve esi

        mov ebx, 1              ; 1= stdout     
        mov ecx, [ebp + 12]     ; const char *buf [address of string] goes into ecx
       
        xor edx, edx

        .char_loop:
                cmp [ecx + edx], byte 0         ; look for null terminator
                je .done
               
                mov eax, 4              ; sys write
                int 0x80

                inc edx
                jmp .char_loop

        .done:
                pop esi                 ; restore esi
                pop edi                 ; restore edi               
                pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret

notice I am now preserving and restoring edi and esi registers in addition to ebx, per instructions by my professor

Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #40 on: September 09, 2017, 09:22:42 PM »
What I got so for for l_write:

Code: [Select]
bits 32

section .text

global l_write

l_write:
        push ebp                ;prologue
        mov ebp, esp

        push ebx                ; preserve regisers
        push edi
        push esi

        mov ebx, [ebp + 8]      ; fd stored in ebx
        mov ecx, [ebp + 12]     ; char *buf stored in ecx
        mov edx, [ebp + 16]     ; len stored in edx

        xor esi, esi            ; counter

        cmp edx, 0              ; check for 0 len
        jle .done

        .char_loop:
                               
                mov eax, 4              ; sys write
                int 0x80

                cmp eax, 0              ; check for error
                jle .error

                inc ecx                 ; move to next char
                inc esi                 ; increment counter

                cmp esi, [ebp + 16]     ; does bytes written = len?
                je .done

                jmp .char_loop

        .error:
                mov eax, -1             ; error
               
                pop esi                 ; restore registers
                pop edi
                pop ebx               
               
                mov esp, ebp            ; epilogue
                pop ebp
                ret

        .done:
                mov eax, esi            ; return # bytes written
               
                pop esi
                pop edi               
                pop ebx                 ; restore registers

                mov esp, ebp            ; epilogue
                pop ebp
                ret

l_write instructions:

int l_write(int fd, char *buf, int len)
write len bytes from buffer buf to file fd. Return the number of bytes actually written or -1 if an error occurs.

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #41 on: September 09, 2017, 09:35:43 PM »
If you're using esi, you ought to preserve it. Push it right after ebx and pop it right before, or the other way around. That's easy.

Now... s'pose we're reading from stdin, and the user types nothing but the "enter" key. After the sys_read eax will be 1 and that's what we want to return - but esi is still zero, no? You may want to start off with esi = 1. Suppose the length, as provided by the caller, is 4, and the user types 3 characters and "enter". eax will be 4 and I guess that's what esi will be when we put it back into eax. I may have to try that one. If the user types 4 or more characters before "enter", eax will be 4 and that's what we'll return. The linefeed and perhaps some characters will remain in the "keyboard buffer" to mess us up later unless we flush them. The assignment doesn't say anything about that, so I guess we can ignore it. We may regret that.

Suppose we're reading from a disk file, or socket. We'll read edx bytes, regardless of linefeeds. Your code counts up to the linefeed (if any) and returns that, "as if" we had stopped at the linefeed. But we didn't. Another read from that file will start where we left off, edx bytes into the file, not at the linefeed. This may not be satisfactory. The assignment says to stop at the linefeed. The only way I can think of to do that is to read one byte at a time, ugly as that is. I really don't know what to advise you on this. Best to stick to the assignment, I'm afraid...

If I get to it, I'll download your code and try it. As we have discovered, untested code can have misteaks. :)

Best,
Frank

Aw, jeez, three new messages? I'll get back to ya...


Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #42 on: September 09, 2017, 11:06:57 PM »
Frank can you provide a little insight as to why

Code: [Select]
cmp [ecx + edx], byte 0

is used to find a null terminator for "l_puts"

ecx= the pointer to the address of the string, correct? void l_puts(const char *buf)

if edx starts at 0, is the first iteration checking the first character in the string for 0?
Then if not zero, edx increments to 1, does that mean that the second byte in the string (or "character array") is then checked for zero? It's not making sense to me how this is working and seems like we have to take a lot on faith in this programming stuff.

No faith, just logic.Everything you say is correct and as it should be. If the zero is the first character, the length is zero - we don't want to count the zero as part of the length. If the zero is the second character, the length is 1, etc.

However...
Code: [Select]
bits 32

section .text

global l_puts

l_puts:
        push ebp                ; prologue
        mov ebp, esp   

        push ebx                ; preserve ebx
        push edi                ; preserve edi
        push esi                ; preserve esi

        mov ebx, 1              ; 1= stdout     
        mov ecx, [ebp + 12]     ; const char *buf [address of string] goes into ecx

Yes, but it's at [ebp + 8], being the first and only parameter! 
     
        xor edx, edx

        .char_loop:
                cmp [ecx + edx], byte 0         ; look for null terminator
;                je .done
No, only "found length", not "done"!
    je .found_lenght
    inc edx
    jmp .char_loop
.found_length:

; now we've got ebx, ecx, and edx where we want 'em               
                mov eax, 4              ; sys write
                int 0x80
    test eax, eax ; just to set flags
    jns .done ; no error
    mov eax, -1
    ; or eax, -1 ; shorter way to do the same thing
;                inc edx
;                jmp .char_loop

        .done:
                pop esi                 ; restore esi
                pop edi                 ; restore edi               
                pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret

Does no harm to preserve registers we don't use.

"l_write" is simpler than you've got it. With the exception of making the error -1, depriving the caller of "what" went wrong, it's just sys_write.
Code: [Select]
bits 32

section .text

global l_write

l_write:
        push ebp                ;prologue
        mov ebp, esp

        push ebx                ; preserve regisers
        push edi
        push esi

        mov ebx, [ebp + 8]      ; fd stored in ebx
        mov ecx, [ebp + 12]     ; char *buf stored in ecx
        mov edx, [ebp + 16]     ; len stored in edx

;        xor esi, esi            ; counter
; we don't need a counter

        cmp edx, 0              ; check for 0 len
        jle .done
; does no harm to check caller for idiocy
; we don't write anything anyway


;        .char_loop:
                               
                mov eax, 4              ; sys write
                int 0x80

                cmp eax, 0              ; check for error
                jle .error
; probably should be just "jl"
; strictly speaking , 0 is not an error
;                inc ecx                 ; move to next char
;                inc esi                 ; increment counter

;                cmp esi, [ebp + 16]     ; does bytes written = len?
;                je .done

;                jmp .char_loop
; we don't need any of that


        .error:
                mov eax, -1             ; error
               
                pop esi                 ; restore registers
                pop edi
                pop ebx               
               
                mov esp, ebp            ; epilogue
                pop ebp
                ret

        .done:
                mov eax, esi            ; return # bytes written
               
                pop esi
                pop edi               
                pop ebx                 ; restore registers

                mov esp, ebp            ; epilogue
                pop ebp
                ret

No real need to duplicate the entire "clean up and go home". We just need to make eax -1 if it was negative (depriving the caller of useful information) and leave it alone if no error. Does no harm.

It probably would have been a good idea to make each of these functions a separate "topic". A little late now.

Now... see if I still feel like looking at l_gets...

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #43 on: September 10, 2017, 02:46:43 AM »
ok, nearly final l_write:

Code: [Select]

bits 32

section .text

global l_write

l_write:
        push ebp                ;prologue
        mov ebp, esp

        push ebx                ; preserve regisers
        push edi
        push esi

        mov ebx, [ebp + 8]      ; fd stored in ebx
        mov ecx, [ebp + 12]     ; char *buf stored in ecx
        mov edx, [ebp + 16]     ; len stored in edx

        cmp edx, 0              ; check for 0 len
        jle .done
                               
        mov eax, 4              ; sys write
        int 0x80

        cmp eax, 0              ; check for error (when eax is less than zero)
        jl .error

        .done:
               
                pop esi
                pop edi               
                pop ebx                 ; restore registers

                mov esp, ebp            ; epilogue
                pop ebp
                ret

        .error:
                mov eax, -1             ; error
                jmp .done

I'm not following how this would return the number of bytes written (which is why I was using esi as a counter, every byte written would increment 1, and then that is moved into eax before returning) unless that part needs to be amended..

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #44 on: September 10, 2017, 02:54:38 AM »
This is what I've got for l_gets. It's pretty much your code. I moved "inc esi" up to the top so our first try is 1, not 0. If esi==len, we do want to do that read, in case the LF is there. Now we do. I cut back to reading one byte at a time, little as I like it. Fugly! I did not attempt to "flush the buffer". I indicated where we might want to - only if we're reading from stdin!

Code: [Select]
; nasm -f elf32 l_gets.asm -d TESTMAIN
; ld -o l_gets l_gets.o

bits 32

%ifdef TESTMAIN

section .bss
buf resb 80
fd resd 1

section .data
filename db `l_gets.asm\0` ; ourself - we know it's there

section .text
global _start
_start:

mov eax, 5 ; sys_open
mov ebx, filename
xor ecx, ecx
xor edx, edx
int 80h
test eax, eax
js exit ; bail out if error
mov [fd], eax

; test a call to it

; try multiple calls if we're reading file
; just to make sure we're stopping at LF
; and can continue from there
mov esi, 7
top:

push 80 ; length
push buf
push dword [fd]
call l_gets
add esp, 4 * 3

; print what we l_getsed - l_got?

mov edx, eax ; length read
mov eax, 4  ;sys_write
mov ebx, 1 ; stdout
mov ecx, buf
int 80h

; only if we're doing multiple reads
dec esi
jnz top

exit:
                mov ebx, eax            ; return value is number of bytes written (len)
                mov eax, 1              ; sys exit
                int 0x80
%endif
;-----------------------------
section .text

global l_gets

l_gets:
        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve ebx
push esi

        mov ebx, [ebp + 8]      ; fd parameter into ebx
        mov ecx, [ebp + 12]     ; char *buf in ecx
        mov edx, [ebp + 16]     ; len in edx
        xor esi, esi            ; counter

        cmp edx, 0              ; if len zero or less, exit
        jle .done

mov edx, 1 ; read one byte at a time. ugh!

        .char_loop:       
        inc esi ; increment count first
                mov eax, 3              ; sys read
                int 0x80

                cmp [ecx], byte 0xA     ; test for linefeed
                je .done

                inc ecx                 ; advance to next byte

                cmp esi, [ebp + 16]     ; does read bytes = len?
                je .done
; if this happens, we didn't find a LF
; if we're reading stdin, this indicates overflow
; we might want to flush OS's input buffer ("keyboard buffer")

                jmp .char_loop
       
        .done:
                mov eax, esi            ; # bytes read into eax               

        pop esi
        pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret
;-----------------------------------------------


That's as far as I got. I'm not sure it's "complete". See what you think...

Best,
Frank

Ah, again we're bumping into each other. Your l_write looks good at first glance. It returns the number of bytes written because that's what sys_write does!