NASM - The Netwide Assembler

NASM Forum => Programming with NASM => Topic started by: turtle13 on September 04, 2017, 06:50:15 AM

Title: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 04, 2017, 06:50:15 AM
For a class assignment I must write custom versions of C type functions such as:

strlen, strcmp, gets, puts, write, open, close, exit

From what I understand, this requires using the cdecl calling convention so I will be preserving and restoring the ebx, edi, esi, and ebp registers, and the caller will clean the stack. eax holds the return value.

So far I have some skeleton that I have begun for the strlen (called l_strlen) type function (which returns the int. value of the number of characters in a given string):

Code: [Select]
bits 32

section .data
; variables go here:
; var_name db values
string1 db 'string', 0          ; null terminated string
string1_len equ $ - string1     ; length of string1

section .text

global l_strlen


l_strlen:
        xor eax, eax            ; zero eax
        push eax                ; preserve eax
        push ebx                ; preserve ebx
        push edi                ; preserve edi
        push esi                ; preserve esi       
       
        push ebp                ; prologue: set up stack frame
        mov ebp, esp

        .char_loop
                ; while the byte (char) being compared is not "0"
                ;       add one to ecx
                ;       jmp .char_loop
                ; if the byte (char) is "0" and no characters remaing (meaning null terminated)
                ;       jmp .end_loop

        .end_loop

                mov esp, ebp            ; epilogue: restore caller's frame pointer
                pop ebp

                ret
                pop eax                 ; this is where the final return value is located
                pop esi                 ; restore esi
                pop edi                 ; restore edi
                pop ebx                 ; restore ebx

Questions about my code:

- for the .char_loop I have pseudocode, I'm trying to figure out exactly how to accomplish this task (or if the task is even appropriate?)

- How do I manipulate the code so that the string being measured is not statically declared like I did with variable 'string1' (such that 'l_strlen(any_string)') ?

- anything else that seems off to you (or better yet, is anything even correct?)
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 04, 2017, 08:28:43 AM
It's pretty much correct, but you've got some things out of order.

You don't need to "preserve" registers that you don't use. For this simple task, we can get by with registers that we're allowed to alter. That simplifies things. The "prologue", as the name suggests, wants to be the first thing in your function...

Code: [Select]
bits 32

section .data
; variables go here:
; var_name db values
string1 db 'string', 0          ; null terminated string
string1_len equ $ - string1     ; length of string1

section .text

;-----------------------------------------------
; this is a "test main" it should not be in your final code
        global _start
        _start:
        push string1 ; address of string1
        call l_strlen
        add esp, 4 ; "remove" parameter
; length is returned in eax
; make it our exit code
        mov ebx, eax
        mov eax, 1 ; sys_exit
        int 0x80
; end of "test main"
;-------------------------------------------

global l_strlen

l_strlen:
;        xor eax, eax            ; zero eax
;        push eax                ; preserve eax
;        push ebx                ; preserve ebx
;        push edi                ; preserve edi
;        push esi                ; preserve esi       
; this part we do need:
       
        push ebp                ; prologue: set up stack frame
        mov ebp, esp

; if we needed to preserve registers, do it here

        xor eax, eax ; since we want the result in eax
        mov ecx, [ebp + 8] ; first (only) parameter
        .char_loop
                ; while the byte (char) being compared is not "0"
; for clarity: the byte we're looking for is the number zero
; not the character "0". They're not the same thing!
        mov dl, [ecx]
        cmp dl, byte 0
        jz .end_loop
        inc eax ; increase counter
        inc ecx ; move to next character
                ;       add one to ecx
                ;       jmp .char_loop
        jmp .char_loop
                ; if the byte (char) is "0" and no characters remaing (meaning null terminated)
                ;       jmp .end_loop

        .end_loop

; if we had preserved registers, pop 'em here

                mov esp, ebp            ; epilogue: restore caller's frame pointer
                pop ebp

                ret
 
; this stuff after "ret" would never be reached anyway
;               pop eax                 ; this is where the final return value is located
;                pop esi                 ; restore esi
;                pop edi                 ; restore edi
;                pop ebx                 ; restore ebx


That's untested. I should know better than to post untested code, but it's late here...

As you can see, I've added a "test main" so that you can assemble and link the code and run it. As you probably know, we can see the exit code by typing "echo $?". Only one byte is valid, but that should be enough for short strings. I think I've got it right, but no promises...

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 02:46:12 AM
Frank your advice worked perfectly, I compiled the program with the short "main" function and it is returning the length of "string1" as exit code!

Now I'm assuming that I don't need to leave in the
Code: [Select]
string1 db 'string', 0          ; null terminated string
string1_len equ $ - string1     ; length of string1
part of the code because this function should be used to examine any length string. Should I just delete those two lines or is anything else required to make this happen?


Why is "dl" used in 'mov dl, [ecx]' ? If I understand it, dl is the low order byte of the edx register, but how does edx and dl come into play here?

Thanks again!
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 03:26:16 AM
Moving on to the next C function "strcmp"

Instructions:
int l_strcmp(char *str1, char *str2);
return 0 if str1 and str2 are equal, return 1 if they are not. Note that this is not the same definition as the C standard library function strcmp.

Here is my code so far:

Code: [Select]

bits 32

section .data

string1 db 'hello', 0
string2 db 'hello', 0
string3 db 'Hello!', 0


section .text

global l_strcmp

l_strcmp:

        push ebp                ; prologue: set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for storing result (0= equal, 1= not equal)
        mov ecx, [ebp + 8]      ; first parameter (string1) stored in ecx
        mov edx, [ebp + 12]     ; second parameter (string2) stored in edx

        .char_loop:
                ; code to compare every character in both strings
                mov cl, [ecx]           ; move the current character into the cl segment of ecx
                mov dl, [edx]           ; move the current character into the dl segment of edx
                cmp dl, cl
                jne .done_1                ; if char in string1 != string2, exit with result 1
               
                ; how to examine if the null terminator has been met and both strings match?               
                jmp .char_loop             ; continue examining characters


        .done_1:
               
                mov eax, 1              ; returns 1 when strings do not match
                mov esp, ebp            ; epilogue: restore caller's frame pointer
                pop ebp
                ret

        .done_0:

                mov eax, 0              ; returns 0 when strings do match
                mov esp, ebp
                pop ebp
                ret

A bit of misunderstanding on what "cl" and "dl" are doing.. need some clarification on that, as well as if the loops appear to operate properly.
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 03:40:55 AM
Good. What I found when I tried it was that Nasm burped up a couple of warnings about the lack of colons on a couple of labels. Just put colons on 'em, or that warning can be turned off.

The "string" can be considered part of the "test main". You don't need it in the final code.

The "dl" register was just someplace to put the single byte we're looking at. You don't need that, either. Could be done as:
Code: [Select]
cmp [ecx], byte 0
I was just trying to implement "something" from your pseudo-code. edx, and its "parts" dl and dh, are "volatile" according to the cdecl calling convention. We don't have to preserve it... so I used it.

We can get by without the stack frame, too. If we don't meddle with ebp, the first parameter is at [esp + 4]. We probably "should" use a stack frame, though - it allows a debugger to do a "back trace" to see where we were called from.

If you'll step into the museum for a moment... In 16-bit code, only bx and bp could be used for "base" registers. [sp + ?] was not a valid addressing mode. We had no choice but to...
Code: [Select]
push bp
mov bp, sp
mov ax, [bp + 4]
or whatever. 32-bit addressing modes are much more flexible - any register can be a "base" register, so we can use [esp + 4]. etc. Still, it is common to set up a stack frame...

If I'm feeling ambitious, I may work up a "super short" version of this. Probably not...

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 03:48:36 AM
You're getting ahead of me... you Hare! :)

Quote
A bit of misunderstanding on what "cl" and "dl" are doing..
Since cl and dl are parts of ecx and edx, they're trashing ecx and edx so they no longer will point to your strings. Use some other 8-bit registers - al and ah, perhaps.

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 03:54:54 AM
^ I was thinking that as I was doing it, that edx and ecx would get messed up somehow. This assembly stuff is so strict but at the same time the wild wild west in how you want to handle data and instructions

So here is my final version of the strcmp:

Code: [Select]
bits 32

section .data

string1 db 'hello', 0
string2 db 'hello', 0
string3 db 'Hello!', 0


section .text

global l_strcmp

l_strcmp:

        push ebp                ; prologue: set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for storing result (0= equal, 1= not equal)
        mov ecx, [ebp + 8]      ; first parameter (string1) stored in ecx
        mov edx, [ebp + 12]     ; second parameter (string2) stored in edx

        .char_loop:
                ; code to compare every character in both strings
                cmp [ecx], [edx]        ; compare the characters in the ecx, edx registers
                jne .done_1             ; if char in string1 != string2, exit with result 1
                cmp ecx, byte 0         ; tests for null terminator
                je .done_0              ; jump to done if null terminator         
                jmp .char_loop             ; continue examining characters


        .done_1:
               
                mov eax, 1              ; returns 1 when strings do not match
                mov esp, ebp            ; epilogue: restore caller's frame pointer
                pop ebp
                ret

        .done_0:

                mov eax, 0              ; returns 0 when strings do match
                mov esp, ebp
                pop ebp
                ret

I feel I've built a suspension bridge on quicksand with this one

*OK so I just realize I forgot to increment ecx and edx. I would add:

Code: [Select]
inc ecx
inc edx

in the .char_loop between je.done_0 and jmp .char_loop
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 04:01:20 AM
I don't think that'll even assemble, will it?

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 04:09:59 AM
nope, giving me an error with line "cmp [ecx], [edx]"

*just did it!! Returns 1 when strings are different, 0 when strings are the same!

Code: [Select]

bits 32

section .data

string1 db 'hello', 0
string2 db 'hello', 0
string3 db 'Hello!', 0


section .text

;-----------------------------------------------
; this is a "test main" it should not be in your final code
        global _start
        _start:
        push string1 ; address of string1
        push string3
        call l_strcmp
        add esp, 8 ; "remove" parameters
; length is returned in eax
; make it our exit code
        mov ebx, eax
        mov eax, 1 ; sys_exit
        int 0x80
; end of "test main"
;-------------------------------------------

global l_strcmp

l_strcmp:

        push ebp                ; prologue: set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for storing result (0= equal, 1= not equal)
        mov ecx, [ebp + 8]      ; first parameter (string1) stored in ecx
        mov edx, [ebp + 12]     ; second parameter (string2) stored in edx

        .char_loop:
                ; code to compare every character in both strings
                mov al, [ecx]
                mov ah, [edx]               
                cmp al, ah        ; compare the characters in the ecx, edx registers
                jne .done_1             ; if char in string1 != string2, exit with result 1
                cmp al, byte 0         ; tests for null terminator
                je .done_0              ; jump to done if null terminator
                inc ecx
                inc edx         
                jmp .char_loop             ; continue examining characters


        .done_1:
               
                mov eax, 1              ; returns 1 when strings do not match
                mov esp, ebp            ; epilogue: restore caller's frame pointer
                pop ebp
                ret

        .done_0:

                mov eax, 0              ; returns 0 when strings do match
                mov esp, ebp
                pop ebp
                ret
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 04:53:12 AM
There ya go!

Now... the real C "gets()" is notoriously unsafe. Some versions of gcc will warn if you try to use it. Instead, make the caller tell you how big the buffer is, and don't "get" any more than that. Please!

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 05:27:49 AM
No onto the l_gets:

instructions:

int l_gets(int fd, char *buf, int len);
read at most len bytes from file fd, placing them into buffer buf. Terminate early if a new line character ('\n', 0x0A) characters is read. If a new line character is encountered, it should be stored into the output buffer and counted in the total number of bytes read. Return the total number of bytes read (which may be zero if end of file is reached or an error occurs). This function does not place a null termination character after the last character read. That is the responsibility of the caller.


Here is some code I have so far, I just want to make sure I am setting it up correctly:


Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for storing return result

        mov ecx, [ebp + 8]      ; third parameter (int len) stored into ecx
        mov edx, [ebp + 12]     ; second parameter (char *buf) stored into edx
        mov esi, [ebp + 16]     ; first parameter (int fd) stored into esi

^since parameters for cdecl are stored right to left, that is why I am adding to the stack like that. Not sure if this is correct.

I'm lost as to where to/ how to store the buffer data. I should get the value for len, and loop that many times while writing the data to the buffer (which would be edx according to my code above)?
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 06:27:49 AM
Looks remarkably like sys_read, does it not?

Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for storing return result
; going to need it for the system call number, no?

; going to need to preserve ebx
push ebx

        mov ecx, [ebp + 8]      ; third parameter (int len) stored into ecx
; fd - going to want it in ebx

        mov edx, [ebp + 12]     ; second parameter (char *buf) stored into edx
; going to want it in ecx

        mov esi, [ebp + 16]     ; first parameter (int fd) stored into esi
; max length - going to want it in edx

; now do your sys_read
; if error (eax negative) we want it to be zero
; that's what it says...
; otherwise number of characters - like sys_read

pop ebx

; epilogue...

That's how I understand it, anyway...

You may want to "flush" any excess the pesky user types...

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 07:48:54 PM
Here is what I came up with so far:

Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx

        mov eax, 3              ; sys call for read
        int 0x80

        ; read data onto stack:
        .buf_loop:
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop
                xor eax, eax            ; zero eax to be used for counter
                push ebx                ; push the character onto stack
                inc ebx                 ; advance to next character
                inc eax                 ; advance the counter
                cmp edx, eax
                je .done

        .loop1:
       

                cmp register, byte 0x0A           ; check for newline, exit loop if true
                je .done

        .done:
               

Hopefully the comments are enough to tell you about what I am trying to do here..
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 10:28:41 PM
Well... no...
Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx

        mov eax, 3              ; sys call for read
        int 0x80

Up to here, I follow you. In fact, it looks like you're about done...

Code: [Select]
        ; read data onto stack:
        .buf_loop:
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop
                xor eax, eax            ; zero eax to be used for counter
If you zero eax in the loop, it's going to run for a long time!
Code: [Select]
                push ebx                ; push the character onto stack
                inc ebx                 ; advance to next character
Last I knew, ebx was your file descriptor...

Code: [Select]
                inc eax                 ; advance the counter
                cmp edx, eax
                je .done
Fair enough... if you don't zero eax in the loop...
Code: [Select]
        .loop1:
       

                cmp register, byte 0x0A           ; check for newline, exit loop if true
                je .done

        .done:
I don't see where we "loop", and to where... The last part of it won't even assemble!

After the sys_read, your data's in the buffer that the caller specified, and eax holds bytes read, including the linefeed that ends input. At least that's true if we're reading from stdin. I'm less sure of how sys_read will behave on a "real file" (or, for that matter, if stdin is redirected). If it's a "text file", okay, but what if it's a "binary file"? Are we expected to stop at any number 10 we encounter? I think of "gets()" as being exclusively for stdin, but your assigned "l_gets" is apparently different. I may have to experiment and see what happens on a "real file"...

I mentioned up above that you might want to "flush" any excess. That would apply only to stdin.

Later,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 06, 2017, 02:04:54 AM
The point of this assignment is to use these functions for our next assignment, which makes a socket call to a web server and downloads a .html or .txt file, so it is supposed to be reading plain text (no binary).

I played with it a little more.. I would like to use the stack as the buffer and push each byte onto the stack, and use esp as the pointer to the buffer.

Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx

        add esp, 12             ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack
       
        cmp edx, 0              ; if len is zero or less, exit program
        jle .done       
        ; read data onto stack:
        .buf_loop:
               
                mov eax, 3              ; sys call for read, to begin reading bytes
                int 0x80               
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop
                push esp

                add esp, 4              ; advance to next character
                inc eax                 ; advance the counter
                cmp edx, eax
                je .done

; ignore stuff below this for now
        .loop1:
       

                cmp register, byte 0x0A           ; check for newline, exit loop if true
                je .done

        .done:

is this making sense?
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 06, 2017, 03:50:23 AM
Well, no. This may be my fault. Shortly after my last post, I lost my computer, and it's taken me 'til just now to beat it into rebooting. I'm pretty frazzeled, and haven't looked at any of the stuff I said I'd look at.

You're pretty far from a socket call. You can read from a socket with sys_read, but recv is more common. I don't see where this is going at all.

You can use the stack for a buffer. You want to subtract something from esp, not add it. You almost certainly do not want to use esp as a pointer into it.
Code: [Select]
push esp
add esp, 4
gets you right back where you were before pushing esp. It may be my confused mental state, but I don't see what you're trying to accomplish here. You can not push a byte. You can push a word, but you don't usually want to.

You can read bytes one at a time with sys_read. It'll be slower than a gut-shot wolf bitch with nine suckling pups dragging a number nine trap uphill in a snowstorm, but you can do it. I don't see the point, when sys_read does exactly what your assignment describes.

Let me try to get myself organized and see if I can get back into this.

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 06, 2017, 06:10:46 AM
Frank. you are killing me here with your colloquialisms  ;D ;D I needed the laugh after the stress this class is putting me through

So I cleaned up the code a bit and I think got it closer to where I need to be:

Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx
       
        cmp edx, 0              ; if len is zero or less, exit program
        jle .done       
        ; read data onto stack:
        .buf_loop:
   
                push esp
                cmp esp, byte 0x0A         ; if a newline character is encountered, exit
                je .done       
                sub esp, 4              ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack           
                mov eax, 3              ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len)
                int 0x80               
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop

                inc eax                 ; advance the counter
                cmp edx, eax
                je .done
                jmp .buf_loop           ; continue


        .done:
                mov eax, 1              ; sys exit
                mov ebx, edx            ; return value is number of bytes written (len)
                int 0x80

Maybe you could help me write a test main like you did before so I can test this one out? It is at least "compiling" (or do we say assembling?) fine.
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 06, 2017, 09:08:24 AM
We say "assembling". It is correct to call it "compiling" (an assembler is a compiler for assembly language)... but the asm-heads will think you're a newbie if you do.

This is a bare minimum. I commented out a lot of your code. You do a bunch of stuff with esp before you even read anything. I moved the exit code up into the "test main" where it belongs. Your caller is not likely to be pleased if you exit in the middle of the function. The function wants an epilogue and return (with the count in eax... where sys_read puts it).

It only tests reading from stdin. In order to test with a file, we'd need a file descriptor. We could put a sys_open in the test main, or... you're going to write one, right?

You seem to be very concerned with finding that linefeed. It's at [ecx + eax -1] - sys_read stops when it sees it. If it's not there, the pesky user has typed more than we had room for... and we probably should "flush" that. If we don't, it'll screw up the next read... which may be after the program has exited! Try it! Type 10 characters and "ls" before you hit "enter" and see what happens. "ls" is harmless, but "rm ." is not!

Reading from a real file won't do that. It will stop after edx bytes and there's no "keyboard buffer". It "may" stop earlier if it sees a linefeed. I kinda don't think so, but I need to try it. If you're going to be reading from a socket, what you'll actually see is carriage return linefeed pairs. http is fussy about that! It would make a lot of sense to keep reading past them until we've got the whole file or a full buffer. But the assignment seems to say it wants us to stop...

Code: [Select]
; Nasm -f elf32 l_gets.asm -d TESTMAIN
; ld -o l_gets l_gets.o

bits 32

%ifdef TESTMAIN
section .data

section .bss
buf resb 10

section .text
global _start
_start:

; test a call to it

push 10 ; length
push buf
push 0 ; stdin
call l_gets
add esp, 4 * 3

;print what we l_getsed

mov eax, 4  ;sys_write
mov ebx, 1 ; stdout
mov ecx, buf
mov edx, 10
int 80h

exit:
                mov eax, 1              ; sys exit
                mov ebx, edx            ; return value is number of bytes written (len)
                int 0x80
%endif
;-----------------------------
section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

;        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx
       
        cmp edx, 0              ; if len is zero or less, exit program
        jle .done       
        ; read data onto stack:
        .buf_loop:
   
;                push esp
;                cmp esp, byte 0x0A         ; if a newline character is encountered, exit
;                je .done       
;                sub esp, 4              ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack           


                mov eax, 3              ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len)
                int 0x80               
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop

;                inc eax                 ; advance the counter
;                cmp edx, eax
;                je .done
;                jmp .buf_loop           ; continue


        .done:
pop ebx ; restore caller's reg
; epilogue
mov esp, ebp
pop ebp
ret
Tested (a little) but incomplete...

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 06, 2017, 06:12:05 PM
Should I not leave in the last part of the .buf_loop:

Code: [Select]

                inc eax                 ; advance the counter
                cmp edx, eax
                je .done
                jmp .buf_loop           ; continue

Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 06, 2017, 07:00:03 PM
I don't see why. After your sys_read, eax is the number of bytes actually read. edx is the maximum number to read. You're counting the number of bytes not read? There's something you're trying to do with this ".buf_loop" that I'm not getting. Quite possibly my fault...

Put it back in and see what it does, if you like...

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 06, 2017, 08:06:21 PM
- How do you declare a variable that takes in a dynamic sized amount (for buf).. because right now buf is only big enough for 10 characters
- The program requirement is for it to stop running when it reaches a new line feed, I think this is for parsing in subsequent networking assignment. Why is the newline character at [ecx + eax -1]?
- Why are we only preserving and restoring ebx and not ecx or edx?
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 06, 2017, 08:38:28 PM
You can get more memory with sys_brk... but you shouldn't have to worry about that. It's the caller's responsibility to provide the buffer and tell you how big it is. To get more buffer in "test main" just make it bigger.

The linefeed's at [ecx + eax -1] 'cause that's where it is. sys_read returns the number of bytes read in eax, and it stops when (doesn't stop until) it gets the linefeed. ecx, of course, is the beginning of the buffer.

As we've discussed, this is (probably) going to be different with a real file... or a socket. If the assignment really requires it, you could read one byte at a time (put 1 in edx and ignore what the caller tells you). Makes more sense to simply find it in the buffer...

Best,
Frank


Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 06, 2017, 09:09:17 PM
Will this code do what I need it to (it works using test main and preset buffer but if it is trying to take in a variable amount of text, will it do that and return the number of bytes read as exit code: even if a newline character is located before the amount of "len" ?


Code: [Select]
global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx
       
        cmp edx, 0              ; if len is zero or less, exit program
        jle .done       

        cmp ecx, [ecx + eax -1] ; compare curremt character to new line, exit if true
        je .done
        ; read data onto stack:

        mov eax, 3              ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len)
        int 0x80               



        .done:

                pop ebx                 ; restore the caller's ebx
                mov esp, ebp            ; epilogue- clean up the stack
                pop ebp
                ret
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 06, 2017, 09:25:35 PM
Something like this?
Code: [Select]
bits 32

%ifdef TESTMAIN
section .data

section .bss
buf resb 80

section .data
filename db `l_gets.asm\0` ; ourself - we know it's there


section .text
global _start
_start:

mov eax, 5 ; sys_open
mov ebx, filename
xor ecx, ecx
xor edx, edx
int 80h
test eax, eax
js exit ; bail out if error

; test a call to it

push 80 ; length
push buf
push eax ; fd
call l_gets
add esp, 4 * 3

; find NL - we want to stop there
xor edx, edx
find:
cmp [ecx + edx], byte 10
je found
inc edx
jmp find
found:
; edx is length to print

;print what we l_getsed

; mov edx, eax ; length read
mov eax, 4  ;sys_write
mov ebx, 1 ; stdout
mov ecx, buf
int 80h

exit:
                mov eax, 1              ; sys exit
                mov ebx, edx            ; return value is number of bytes written (len)
                int 0x80
%endif
;-----------------------------


global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

;        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx
       
        cmp edx, 0              ; if len is zero or less, exit program
        jle .done       
        ; read data onto stack:
        .buf_loop:
   
;                push esp
;                cmp esp, byte 0x0A         ; if a newline character is encountered, exit
;                je .done       
;                sub esp, 4              ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack           
                mov eax, 3              ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len)
                int 0x80               
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop

;                inc eax                 ; advance the counter
;                cmp edx, eax
;                je .done
;                jmp .buf_loop           ; continue


        .done:
pop ebx ; restore caller's reg

mov esp, ebp
pop ebp
ret

Could put the "find linefeed" part in the function. That may be what the assignment expects?

Best,
Frank

Ahhh, just read your latest. I think your code will work reading from stdin. (ahhh, no it won't - ecx is not likely 0xA - especially before the sys_read!) This tests from a real file. As I expected, it reads past the linefeed. Comment out the "find LF" part to see what it does without it...


Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 07, 2017, 10:06:01 PM
Frank,

At this point my brain is feeling clobbered and I want to break this simple function down step by step.

First off, I would just like to write the part of the code that pushes each byte onto the stack to be read.

Would I implement this as a something like:

Code: [Select]
sys read
push buf
pop ecx
add [buf], 4
inc counter
if counter == len OR if ecx == 0xOA
jmp done

Is this the correct way? I am having a tough time understanding how the stack can be used to store one byte at a time as a kind of "buffer," but my professor says that it can be done. He isn't too keen on giving us the exact code on how to do it, however.

We're gonna have to pop each byte as it is pushed onto the stack, correct?
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 07, 2017, 10:49:59 PM
Here is a stack diagram I drew to try to visualize how this code is working:

Code: [Select]

bits 32

section .text

l_gets:
        push ebp        ; prologue
        mov ebp, esp
        push ebx

        mov ebx, [ebp + 8]        ; set up registers on stack for args
        mov ecx, [ebp + 12]
        mov edx, [ebp + 16]

        cmp edx, 0
        jle .done

        .read_loop:
                mov eax, 3    ; sys call read
                int 0x80

                push buf        ; push buf onto a stack because this is where each character is going to be stored
                pop ecx         ; pops the current character in ecx into buf
                ; this part I am getting hung up on, how to increment to the next character to be read in from whatever file is coming in

                cmp edx, char_count ; jump if number of bytes read has reached “len”
je .done

        .done:
                pop ebx
                mov esp, ebp        ; epilogue
                pop ebp
                ret

(http://i.imgur.com/tbed9hM.jpg)
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 04:29:33 AM
i was able to write this functioning l_gets:

Code: [Select]
bits 32

section .text

global l_gets

l_gets:
        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter into ebx
        mov ecx, [ebp + 12]     ; char *buf in ecx
        mov edx, [ebp + 16]     ; len in edx

        cmp edx, 0              ; if len zero or less, exit
        jle .done

        mov eax, 3              ; sys read
        int 0x80
       
        .done:
                pop ebx                 ; restore ebx
                mov esp, ebp            ; epilogue
                pop ebp
                ret

When using your testmain though, you move 10 to edx (for 10 characters). Even though I type less than 10 characters into stdin, i always get a return value of 10.. the program should return only the number of bytes read. This and figuring out how to stop input after the newline character is read are all that's left and i'm pretty much done with this one. Onto the next..
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 08, 2017, 04:37:25 AM
Ahhh... your instructor is apparently not explaining this well. I probably won't explain it well, either. Such is life...

You can indeed put one byte on the stack, but forget about "push" and "pop" to do so. First, let us create a "local" variable on the stack to use as a "buffer":
Code: [Select]
func:
    push ebp
    mov ebp, esp
    sub esp, 64
    ...

The amount we subtract from esp should be a multiple of 4, just to keep the stack well aligned. It could be more (or less) but probably shouldn't be more than 4096 (a "page"). So we've got a local variable that we can use as a buffer (or anything else). Its address is ebp - 64. That's the address of the first (lowest) byte, as usual.

I guess what you're trying to do is to read one byte at a time into this buffer - instead of the buffer the caller has provided? If so, we ignore the buffer the caller has provided, and the length. We still need the file descriptor...
Code: [Select]
func:
    push ebp
    mov ebp, esp
    sub esp, 64
    mov ebx, [ebp + 8]
    lea ecx, [ebp - 64]
    xor esi, esi ; to use as counter
    mov edx, 1 ; read just one byte
.top:
    mov eax, 3 ; sys_read
    int 0x80
    cmp [ecx], byte 0xA ; linefeed?
    je .done
    inc ecx ; next byte in buffer
    inc esi ; count it
    cmp esi, 64 ; we're out of local buffer
    je .done
    cmp esi, [ebp + 16] ; caller's idea of length
    je .done
    jmp .top ; go get another one
.done:
    ; do something intelligent...
    ; clean up and go home
We may need more than one ".done:"  label here... I don't think I'm too fond of this approach.

I think what I would rather do is a perfectly ordinary sys_read into the buffer the caller provides for the length the caller provides. Then... if the file descriptor is stdin, make sure that we do have that linefeed. If not, read into a dummy buffer until we find it to flush the keyboard buffer. This is complicated by the fact that *nix is fairly tolerant about file descriptors. If we try to read from stdout or even stderr, we still read from the keyboard. I'm not sure what happens with stdaux. If the file descriptor is 4 or greater, we presumably have a disk file or socket. This will read past the linefeed. In the case of a disk file, we could seek back to where the linefeed was found. I don't think this will work on a socket. We may have no choice but to read one byte at a time. I think I'd still read into the buffer the caller provides...

Quote
We're gonna have to pop each byte as it is pushed onto the stack, correct?

I don't think so. Why push it at all if you're only going to pop it again? I'm still not sure I understand what you're trying to do here.

Your diagram looks pretty good except that you've left out the return address (below the parameters and above old ebp). Also, esp points at what  you pushed last, not below it yet...

Code: [Select]
push buf
pushes the address of buf - probably not useful. You could push four bytes and ignore all but one. To increment an address, you want to put it in a register and increment that.
Code: [Select]
; if ecx is caller's buffer...
lea edi, [ebp - 64] ; our buffer on stack
.top:
mov al, [ecx]
mov [edi], al
inc edi ; or use stosb
inc ecx
jmp .top
That will need an exit condition, of course. I've ignored preserving registers for now. Beside ebx (used by system calls) we'll probably need both esi and edi. That can be added - just trying to keep it simple. Doubt if I've succeeded...

Best,
Frank

next post: probably my fault you're getting the bad return value. Check closely what I've done...

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 05:31:48 AM
Thanks, some things make sense now, others still don't.

Here's what I have so far. I'm not spending any time on this one tonight.

Code: [Select]
global l_gets

l_gets:
        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter into ebx
        mov ecx, [ebp + 12]     ; char *buf in ecx
        mov edx, [ebp + 16]     ; len in edx
        xor esi, esi            ; counter

        cmp edx, 0              ; if len zero or less, exit
        jle .done

        .char_loop:       
       
                mov eax, 3              ; sys read
                int 0x80

                cmp [ecx], byte 0xA     ; test for linefeed
                je .done

                inc ecx                 ; advance to next byte
                inc esi                 ; +count

                cmp esi, [ebp + 16]     ; does read bytes = len?
                je .done

                jmp .char_loop

               
       
        .done:
                mov eax, esi            ; # bytes read into eax               
                pop ebx                 ; restore ebx
                mov esp, ebp            ; epilogue
                pop ebp
                ret
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 05:40:14 PM
These additional functions I need to complete:

void l_puts(const char *buf);
write the contents of the null terminated string buf to stdout. The null byte must not be written. If the length of the string is zero, then no bytes are to be written.

int l_write(int fd, char *buf, int len);
write len bytes from buffer buf to file fd. Return the number of bytes actually written or -1 if an error occurs.

int l_open(const char *name, int flags, int mode);
opens the named file with the supplied flags and mode. Returns the integer file descriptor of the newly opened file or -1 if the file can't be opened.

int l_close(int fd);
close the indicated file, returns 0 on success or -1 on failure.

int l_exit(int rc);
terminate the calling program with exit code rc.



Can you maybe help to "guide" as far as how I should set these up so I can work on these this afternoon? Thank you again
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 08, 2017, 06:17:52 PM
Well, "puts" is just a sys_write to stdout. Yhe only "different" thing about it is the null-terminated string. You can call your I_strlen and move the length to edx or do:
Code: [Select]
cmp [ecx + edx], byte 0
; etc.

The rest of 'em are just wrappers around system calls. They'll look a lot like your l_gets. If error, eax will be -ERRNO - you need to change it to -1. The real C library does this and puts the error number in the global variable "errno". You apparently don't need to do that. Probably were supposed to do that for l_gets, too.

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 09:45:49 PM
Does this look good for l_puts:

Code: [Select]
bits 32

section .text

global l_puts

l_puts:
        push ebp                ; prologue
        mov ebp, esp   

        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; const char *buf goes into ebx

        .char_loop:
                cmp [ebx], byte 0x0     ; look for null terminator
                je .done

                mov eax, 4              ; sys write
                int 0x80               

                jmp .char_loop


        .done:
                pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret

I'm thinking I would need to write each byte, kind of like l_gets reads in each byte. The sys_write takes three args: int, const char*, size_t. Would I need to set up the stack in the .char_loop so that ebx, ecx, and edx hold each of these arguments? I'm a little confused here because the actual call to l_puts only takes in one argument (or are these called parameters?), so not sure how to set this up.
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 10:22:26 PM
Here is what I have so far for l_write:

Code: [Select]
bits 32

section .text

global l_write

l_write:
        push ebp                ;prologue
        mov ebp, esp

        push ebx

        mov ebx, [ebp + 8]      ; fd stored in ebx
        mov ecx, [ebp + 12]     ; char *buf stored in ecx
        mov edx, [ebp + 16]     ; len stored in edx

        xor esi, esi            ; counter

        cmp edx, 0              ; check for error
        jle .error

        .char_loop:
                mov eax, 4              ; sys write
                int 0x80

                inc ecx
                inc esi

                cmp esi, [ebp + 16]     ; does bytes written = len?
                je .done

                jmp .char_loop



        .error:
                mov eax, -1             ; error
                pop ebx                 ; restore ebx
               
                mov esp, ebp            ; epilogue
                pop ebp
                ret

        .done:
                mov eax, esi            ; return # bytes written
                pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret

Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 08, 2017, 10:26:54 PM
Well, no. The one and only parameter (you can call it an argument), the address of the string, goes in ecx. We know that the file descriptor wants to be stdout (1) - that goes in ebx. The length, which you need to find either by calling l_strlen or by finding the zero here, goes in edx. You might want to use a loop like this:
Code: [Select]
; address is in ecx
xor edx, edx
.find:
cmp [ecx + edx], byte 0
jnz .found
inc edx
jmp .find
Or call l_strlen...

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 08, 2017, 10:41:33 PM
Code: [Select]
bits 32

section .text

global l_write

l_write:
        push ebp                ;prologue
        mov ebp, esp

        push ebx

        mov ebx, [ebp + 8]      ; fd stored in ebx
        mov ecx, [ebp + 12]     ; char *buf stored in ecx
        mov edx, [ebp + 16]     ; len stored in edx

        xor esi, esi            ; counter
; you don't need a counter, eax does it
; if you do use esi, you need to push/pop it with ebx

        cmp edx, 0              ; check for error
; and eax will be negative if error, not edx
; and you need to do this after sys_write, not before

        jle .error

        .char_loop:
                mov eax, 4              ; sys write
                int 0x80

                inc ecx
                inc esi

                cmp esi, [ebp + 16]     ; does bytes written = len?
                je .done
; eax will be edx... even if some of 'em are garbage
; unless there's an error

                jmp .char_loop



        .error:
                mov eax, -1             ; error
                pop ebx                 ; restore ebx
               
                mov esp, ebp            ; epilogue
                pop ebp
                ret

        .done:
                mov eax, esi            ; return # bytes written
                pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 11:03:25 PM
Code: [Select]
; address is in ecx
xor edx, edx
.find:
cmp [ecx + edx], byte 0
jnz .found
inc edx
jmp .find

Did you mean that to say "je .found" instead of "jnz .found" ?
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 08, 2017, 11:19:42 PM
Sure enough. My bad.

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 09, 2017, 06:53:20 PM
OK so I got a lot going through my head now, back to the l_gets,

can you take a look at the code and let me know where I should go from here?

Code: [Select]
l_gets:
        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter into ebx
        mov ecx, [ebp + 12]     ; char *buf in ecx
        mov edx, [ebp + 16]     ; len in edx
        xor esi, esi            ; counter

        cmp edx, 0              ; if len zero or less, exit
        jle .done

        .char_loop:       
       
                mov eax, 3              ; sys read
                int 0x80

                cmp [ecx], byte 0xA     ; test for linefeed
                je .done

                inc ecx                 ; advance to next byte
                inc esi                 ; +count

                cmp esi, [ebp + 16]     ; does read bytes = len?
                je .done

                jmp .char_loop

               
       
        .done:
                mov eax, esi            ; # bytes read into eax               
                pop ebx                 ; restore ebx
                mov esp, ebp            ; epilogue
                pop ebp
                ret
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 09, 2017, 07:57:08 PM
Frank can you provide a little insight as to why

Code: [Select]
cmp [ecx + edx], byte 0

is used to find a null terminator for "l_puts"

ecx= the pointer to the address of the string, correct? void l_puts(const char *buf)

if edx starts at 0, is the first iteration checking the first character in the string for 0?
Then if not zero, edx increments to 1, does that mean that the second byte in the string (or "character array") is then checked for zero? It's not making sense to me how this is working and seems like we have to take a lot on faith in this programming stuff.
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 09, 2017, 08:13:09 PM
code for l_puts:

Code: [Select]
bits 32

section .text

global l_puts

l_puts:
        push ebp                ; prologue
        mov ebp, esp   

        push ebx                ; preserve ebx
        push edi                ; preserve edi
        push esi                ; preserve esi

        mov ebx, 1              ; 1= stdout     
        mov ecx, [ebp + 12]     ; const char *buf [address of string] goes into ecx
       
        xor edx, edx

        .char_loop:
                cmp [ecx + edx], byte 0         ; look for null terminator
                je .done
               
                mov eax, 4              ; sys write
                int 0x80

                inc edx
                jmp .char_loop

        .done:
                pop esi                 ; restore esi
                pop edi                 ; restore edi               
                pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret

notice I am now preserving and restoring edi and esi registers in addition to ebx, per instructions by my professor
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 09, 2017, 09:22:42 PM
What I got so for for l_write:

Code: [Select]
bits 32

section .text

global l_write

l_write:
        push ebp                ;prologue
        mov ebp, esp

        push ebx                ; preserve regisers
        push edi
        push esi

        mov ebx, [ebp + 8]      ; fd stored in ebx
        mov ecx, [ebp + 12]     ; char *buf stored in ecx
        mov edx, [ebp + 16]     ; len stored in edx

        xor esi, esi            ; counter

        cmp edx, 0              ; check for 0 len
        jle .done

        .char_loop:
                               
                mov eax, 4              ; sys write
                int 0x80

                cmp eax, 0              ; check for error
                jle .error

                inc ecx                 ; move to next char
                inc esi                 ; increment counter

                cmp esi, [ebp + 16]     ; does bytes written = len?
                je .done

                jmp .char_loop

        .error:
                mov eax, -1             ; error
               
                pop esi                 ; restore registers
                pop edi
                pop ebx               
               
                mov esp, ebp            ; epilogue
                pop ebp
                ret

        .done:
                mov eax, esi            ; return # bytes written
               
                pop esi
                pop edi               
                pop ebx                 ; restore registers

                mov esp, ebp            ; epilogue
                pop ebp
                ret

l_write instructions:

int l_write(int fd, char *buf, int len)
write len bytes from buffer buf to file fd. Return the number of bytes actually written or -1 if an error occurs.
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 09, 2017, 09:35:43 PM
If you're using esi, you ought to preserve it. Push it right after ebx and pop it right before, or the other way around. That's easy.

Now... s'pose we're reading from stdin, and the user types nothing but the "enter" key. After the sys_read eax will be 1 and that's what we want to return - but esi is still zero, no? You may want to start off with esi = 1. Suppose the length, as provided by the caller, is 4, and the user types 3 characters and "enter". eax will be 4 and I guess that's what esi will be when we put it back into eax. I may have to try that one. If the user types 4 or more characters before "enter", eax will be 4 and that's what we'll return. The linefeed and perhaps some characters will remain in the "keyboard buffer" to mess us up later unless we flush them. The assignment doesn't say anything about that, so I guess we can ignore it. We may regret that.

Suppose we're reading from a disk file, or socket. We'll read edx bytes, regardless of linefeeds. Your code counts up to the linefeed (if any) and returns that, "as if" we had stopped at the linefeed. But we didn't. Another read from that file will start where we left off, edx bytes into the file, not at the linefeed. This may not be satisfactory. The assignment says to stop at the linefeed. The only way I can think of to do that is to read one byte at a time, ugly as that is. I really don't know what to advise you on this. Best to stick to the assignment, I'm afraid...

If I get to it, I'll download your code and try it. As we have discovered, untested code can have misteaks. :)

Best,
Frank

Aw, jeez, three new messages? I'll get back to ya...

Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 09, 2017, 11:06:57 PM
Frank can you provide a little insight as to why

Code: [Select]
cmp [ecx + edx], byte 0

is used to find a null terminator for "l_puts"

ecx= the pointer to the address of the string, correct? void l_puts(const char *buf)

if edx starts at 0, is the first iteration checking the first character in the string for 0?
Then if not zero, edx increments to 1, does that mean that the second byte in the string (or "character array") is then checked for zero? It's not making sense to me how this is working and seems like we have to take a lot on faith in this programming stuff.

No faith, just logic.Everything you say is correct and as it should be. If the zero is the first character, the length is zero - we don't want to count the zero as part of the length. If the zero is the second character, the length is 1, etc.

However...
Code: [Select]
bits 32

section .text

global l_puts

l_puts:
        push ebp                ; prologue
        mov ebp, esp   

        push ebx                ; preserve ebx
        push edi                ; preserve edi
        push esi                ; preserve esi

        mov ebx, 1              ; 1= stdout     
        mov ecx, [ebp + 12]     ; const char *buf [address of string] goes into ecx

Yes, but it's at [ebp + 8], being the first and only parameter! 
     
        xor edx, edx

        .char_loop:
                cmp [ecx + edx], byte 0         ; look for null terminator
;                je .done
No, only "found length", not "done"!
    je .found_lenght
    inc edx
    jmp .char_loop
.found_length:

; now we've got ebx, ecx, and edx where we want 'em               
                mov eax, 4              ; sys write
                int 0x80
    test eax, eax ; just to set flags
    jns .done ; no error
    mov eax, -1
    ; or eax, -1 ; shorter way to do the same thing
;                inc edx
;                jmp .char_loop

        .done:
                pop esi                 ; restore esi
                pop edi                 ; restore edi               
                pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret

Does no harm to preserve registers we don't use.

"l_write" is simpler than you've got it. With the exception of making the error -1, depriving the caller of "what" went wrong, it's just sys_write.
Code: [Select]
bits 32

section .text

global l_write

l_write:
        push ebp                ;prologue
        mov ebp, esp

        push ebx                ; preserve regisers
        push edi
        push esi

        mov ebx, [ebp + 8]      ; fd stored in ebx
        mov ecx, [ebp + 12]     ; char *buf stored in ecx
        mov edx, [ebp + 16]     ; len stored in edx

;        xor esi, esi            ; counter
; we don't need a counter

        cmp edx, 0              ; check for 0 len
        jle .done
; does no harm to check caller for idiocy
; we don't write anything anyway


;        .char_loop:
                               
                mov eax, 4              ; sys write
                int 0x80

                cmp eax, 0              ; check for error
                jle .error
; probably should be just "jl"
; strictly speaking , 0 is not an error
;                inc ecx                 ; move to next char
;                inc esi                 ; increment counter

;                cmp esi, [ebp + 16]     ; does bytes written = len?
;                je .done

;                jmp .char_loop
; we don't need any of that


        .error:
                mov eax, -1             ; error
               
                pop esi                 ; restore registers
                pop edi
                pop ebx               
               
                mov esp, ebp            ; epilogue
                pop ebp
                ret

        .done:
                mov eax, esi            ; return # bytes written
               
                pop esi
                pop edi               
                pop ebx                 ; restore registers

                mov esp, ebp            ; epilogue
                pop ebp
                ret

No real need to duplicate the entire "clean up and go home". We just need to make eax -1 if it was negative (depriving the caller of useful information) and leave it alone if no error. Does no harm.

It probably would have been a good idea to make each of these functions a separate "topic". A little late now.

Now... see if I still feel like looking at l_gets...

Best,
Frank

Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 02:46:43 AM
ok, nearly final l_write:

Code: [Select]

bits 32

section .text

global l_write

l_write:
        push ebp                ;prologue
        mov ebp, esp

        push ebx                ; preserve regisers
        push edi
        push esi

        mov ebx, [ebp + 8]      ; fd stored in ebx
        mov ecx, [ebp + 12]     ; char *buf stored in ecx
        mov edx, [ebp + 16]     ; len stored in edx

        cmp edx, 0              ; check for 0 len
        jle .done
                               
        mov eax, 4              ; sys write
        int 0x80

        cmp eax, 0              ; check for error (when eax is less than zero)
        jl .error

        .done:
               
                pop esi
                pop edi               
                pop ebx                 ; restore registers

                mov esp, ebp            ; epilogue
                pop ebp
                ret

        .error:
                mov eax, -1             ; error
                jmp .done

I'm not following how this would return the number of bytes written (which is why I was using esi as a counter, every byte written would increment 1, and then that is moved into eax before returning) unless that part needs to be amended..
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 10, 2017, 02:54:38 AM
This is what I've got for l_gets. It's pretty much your code. I moved "inc esi" up to the top so our first try is 1, not 0. If esi==len, we do want to do that read, in case the LF is there. Now we do. I cut back to reading one byte at a time, little as I like it. Fugly! I did not attempt to "flush the buffer". I indicated where we might want to - only if we're reading from stdin!

Code: [Select]
; nasm -f elf32 l_gets.asm -d TESTMAIN
; ld -o l_gets l_gets.o

bits 32

%ifdef TESTMAIN

section .bss
buf resb 80
fd resd 1

section .data
filename db `l_gets.asm\0` ; ourself - we know it's there

section .text
global _start
_start:

mov eax, 5 ; sys_open
mov ebx, filename
xor ecx, ecx
xor edx, edx
int 80h
test eax, eax
js exit ; bail out if error
mov [fd], eax

; test a call to it

; try multiple calls if we're reading file
; just to make sure we're stopping at LF
; and can continue from there
mov esi, 7
top:

push 80 ; length
push buf
push dword [fd]
call l_gets
add esp, 4 * 3

; print what we l_getsed - l_got?

mov edx, eax ; length read
mov eax, 4  ;sys_write
mov ebx, 1 ; stdout
mov ecx, buf
int 80h

; only if we're doing multiple reads
dec esi
jnz top

exit:
                mov ebx, eax            ; return value is number of bytes written (len)
                mov eax, 1              ; sys exit
                int 0x80
%endif
;-----------------------------
section .text

global l_gets

l_gets:
        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve ebx
push esi

        mov ebx, [ebp + 8]      ; fd parameter into ebx
        mov ecx, [ebp + 12]     ; char *buf in ecx
        mov edx, [ebp + 16]     ; len in edx
        xor esi, esi            ; counter

        cmp edx, 0              ; if len zero or less, exit
        jle .done

mov edx, 1 ; read one byte at a time. ugh!

        .char_loop:       
        inc esi ; increment count first
                mov eax, 3              ; sys read
                int 0x80

                cmp [ecx], byte 0xA     ; test for linefeed
                je .done

                inc ecx                 ; advance to next byte

                cmp esi, [ebp + 16]     ; does read bytes = len?
                je .done
; if this happens, we didn't find a LF
; if we're reading stdin, this indicates overflow
; we might want to flush OS's input buffer ("keyboard buffer")

                jmp .char_loop
       
        .done:
                mov eax, esi            ; # bytes read into eax               

        pop esi
        pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret
;-----------------------------------------------


That's as far as I got. I'm not sure it's "complete". See what you think...

Best,
Frank

Ah, again we're bumping into each other. Your l_write looks good at first glance. It returns the number of bytes written because that's what sys_write does!
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 02:56:06 AM
completed l_puts:

Code: [Select]
bits 32

section .text

global l_puts

l_puts:
        push ebp                ; prologue
        mov ebp, esp   

        push ebx                ; preserve ebx
        push edi                ; preserve edi
        push esi                ; preserve esi

        mov ebx, 1              ; 1= stdout     
        mov ecx, [ebp + 8]      ; const char *buf [address of string] goes into ecx (first and only parameter)
       
        xor edx, edx            ; set up a counter

        .char_loop:
                cmp [ecx + edx], byte 0         ; look for null terminator
                je .found_len
               
                inc edx                 ; counter + 1
                jmp .char_loop



                inc edx
                jmp .char_loop

        .found_len:
                mov eax, 4              ; sys write
                int 0x80

                test eax, eax           ; set flags
                jns .done               ; jump if zero or positive (no error)
                mov eax, -1             ; set error code if error


        .done:
                pop esi                 ; restore esi
                pop edi                 ; restore edi               
                pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 03:04:31 AM
completed l_open:

Code: [Select]
bits 32

section .text

global l_open

l_open:
        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve registers
        push edi
        push esi

        mov ebx, [ebp + 8]      ; name of the file (const char * name)
        mov ecx, [ebp + 12]     ; flags
        mov edx, [ebp + 16]     ; mode

        mov eax, 5              ; open sys call
        int 0x80

        cmp eax, 0              ; check for error
        jl .error

        .done:
                pop esi         ; restore registers
                pop edi
                pop ebx

                mov esp, ebp    ; epilogue
                pop ebp
                ret

        .error:
                mov eax, -1
                jmp .done

int l_open(const char *name, int flags, int mode)
opens the named file with the supplied flags and mode. Returns the integer file descriptor of the newly opened file or -1 if the file can't be opened.
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 03:19:27 AM
completed l_close:

Code: [Select]
bits 32

section .text

global l_close

l_close:
        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve registers
        push edi
        push esi

        mov ebx, [ebp + 8]      ; file descriptor

        mov eax, 6              ; close sys call
        int 0x80

        cmp eax, 0
        jl .error               ; check for error
        je .success             ; check for success

        .done:
                pop esi         ; restore registers
                pop edi
                pop ebx

                mov esp, ebp    ; epilogue
                pop ebp
                ret       

        .error:
                mov eax, -1
                jmp .done
        .success:
                mov eax, 0
                jmp .done

int l_close(int fd)
close the indicated file, returns 0 on success or -1 on failure.
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 03:57:30 AM
And finally, the last function, "l_exit"

Code: [Select]
bits 32

section .text

l_exit:

; are these needed for exit:
;        push ebp                ; prologue
;        mov ebp, esp

;        push ebx                ; preserve registers
;        push edi
;        push esi

        mov ebx, [ebp + 8]      ; int rc
       
        mov eax, 1              ; exit sys call
        int 0x80

int l_exit(int rc)
terminate the calling program with exit code rc.
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 04:03:38 AM
Should I be performing xor on all of these registers at the beginning of all of these functions?
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 04:21:18 AM
Your l_gets seems to work great! However, it looks like it is repeating the same line 7 times, care to extrapolate on that one?
The return value 37 is correct, 36 characters + linefeed  :D

(http://i.imgur.com/OMZFsOh.png)

*edit aha, never mind, I see in the testmain code that you are repeating 7 times (esi= 7)
Ignore this post!

This is what I've got for l_gets. It's pretty much your code. I moved "inc esi" up to the top so our first try is 1, not 0. If esi==len, we do want to do that read, in case the LF is there. Now we do. I cut back to reading one byte at a time, little as I like it. Fugly! I did not attempt to "flush the buffer". I indicated where we might want to - only if we're reading from stdin!

Code: [Select]
; nasm -f elf32 l_gets.asm -d TESTMAIN
; ld -o l_gets l_gets.o

bits 32

%ifdef TESTMAIN

section .bss
buf resb 80
fd resd 1

section .data
filename db `l_gets.asm\0` ; ourself - we know it's there

section .text
global _start
_start:

mov eax, 5 ; sys_open
mov ebx, filename
xor ecx, ecx
xor edx, edx
int 80h
test eax, eax
js exit ; bail out if error
mov [fd], eax

; test a call to it

; try multiple calls if we're reading file
; just to make sure we're stopping at LF
; and can continue from there
mov esi, 7
top:

push 80 ; length
push buf
push dword [fd]
call l_gets
add esp, 4 * 3

; print what we l_getsed - l_got?

mov edx, eax ; length read
mov eax, 4  ;sys_write
mov ebx, 1 ; stdout
mov ecx, buf
int 80h

; only if we're doing multiple reads
dec esi
jnz top

exit:
                mov ebx, eax            ; return value is number of bytes written (len)
                mov eax, 1              ; sys exit
                int 0x80
%endif
;-----------------------------
section .text

global l_gets

l_gets:
        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve ebx
push esi

        mov ebx, [ebp + 8]      ; fd parameter into ebx
        mov ecx, [ebp + 12]     ; char *buf in ecx
        mov edx, [ebp + 16]     ; len in edx
        xor esi, esi            ; counter

        cmp edx, 0              ; if len zero or less, exit
        jle .done

mov edx, 1 ; read one byte at a time. ugh!

        .char_loop:       
        inc esi ; increment count first
                mov eax, 3              ; sys read
                int 0x80

                cmp [ecx], byte 0xA     ; test for linefeed
                je .done

                inc ecx                 ; advance to next byte

                cmp esi, [ebp + 16]     ; does read bytes = len?
                je .done
; if this happens, we didn't find a LF
; if we're reading stdin, this indicates overflow
; we might want to flush OS's input buffer ("keyboard buffer")

                jmp .char_loop
       
        .done:
                mov eax, esi            ; # bytes read into eax               

        pop esi
        pop ebx                 ; restore ebx

                mov esp, ebp            ; epilogue
                pop ebp
                ret
;-----------------------------------------------


That's as far as I got. I'm not sure it's "complete". See what you think...

Best,
Frank

Ah, again we're bumping into each other. Your l_write looks good at first glance. It returns the number of bytes written because that's what sys_write does!
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 15, 2017, 12:50:01 AM
In regards to l_open not working here are feedback comments:

Your l_open doesn't work correctly.  After sys_open, you don't check the return value.  Your l_open should return -1 if you fail to open the file for any reason.

My l_open code:

Code: [Select]
l_open:

        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve registers
        push edi
        push esi

        mov ebx, [ebp + 8]      ; name of the file (const char * name)
        mov ecx, [ebp + 12]     ; flags
        mov edx, [ebp + 16]     ; mode

        mov eax, 5              ; open sys call
        int 0x80

        cmp eax, 0              ; check for error
        jl .error

        .done:
                pop esi         ; restore registers
                pop edi
                pop ebx

                mov esp, ebp    ; epilogue
                pop ebp
                ret

        .error:
                mov eax, -1
                jmp .done


So.. how to fix so that it returns -1 for failure to open?
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 15, 2017, 01:59:45 AM
That looks correct to me!