Author Topic: Help with writing custom C type string functions using NASM  (Read 28557 times)

Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Help with writing custom C type string functions using NASM
« on: September 04, 2017, 06:50:15 AM »
For a class assignment I must write custom versions of C type functions such as:

strlen, strcmp, gets, puts, write, open, close, exit

From what I understand, this requires using the cdecl calling convention so I will be preserving and restoring the ebx, edi, esi, and ebp registers, and the caller will clean the stack. eax holds the return value.

So far I have some skeleton that I have begun for the strlen (called l_strlen) type function (which returns the int. value of the number of characters in a given string):

Code: [Select]
bits 32

section .data
; variables go here:
; var_name db values
string1 db 'string', 0          ; null terminated string
string1_len equ $ - string1     ; length of string1

section .text

global l_strlen


l_strlen:
        xor eax, eax            ; zero eax
        push eax                ; preserve eax
        push ebx                ; preserve ebx
        push edi                ; preserve edi
        push esi                ; preserve esi       
       
        push ebp                ; prologue: set up stack frame
        mov ebp, esp

        .char_loop
                ; while the byte (char) being compared is not "0"
                ;       add one to ecx
                ;       jmp .char_loop
                ; if the byte (char) is "0" and no characters remaing (meaning null terminated)
                ;       jmp .end_loop

        .end_loop

                mov esp, ebp            ; epilogue: restore caller's frame pointer
                pop ebp

                ret
                pop eax                 ; this is where the final return value is located
                pop esi                 ; restore esi
                pop edi                 ; restore edi
                pop ebx                 ; restore ebx

Questions about my code:

- for the .char_loop I have pseudocode, I'm trying to figure out exactly how to accomplish this task (or if the task is even appropriate?)

- How do I manipulate the code so that the string being measured is not statically declared like I did with variable 'string1' (such that 'l_strlen(any_string)') ?

- anything else that seems off to you (or better yet, is anything even correct?)

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #1 on: September 04, 2017, 08:28:43 AM »
It's pretty much correct, but you've got some things out of order.

You don't need to "preserve" registers that you don't use. For this simple task, we can get by with registers that we're allowed to alter. That simplifies things. The "prologue", as the name suggests, wants to be the first thing in your function...

Code: [Select]
bits 32

section .data
; variables go here:
; var_name db values
string1 db 'string', 0          ; null terminated string
string1_len equ $ - string1     ; length of string1

section .text

;-----------------------------------------------
; this is a "test main" it should not be in your final code
        global _start
        _start:
        push string1 ; address of string1
        call l_strlen
        add esp, 4 ; "remove" parameter
; length is returned in eax
; make it our exit code
        mov ebx, eax
        mov eax, 1 ; sys_exit
        int 0x80
; end of "test main"
;-------------------------------------------

global l_strlen

l_strlen:
;        xor eax, eax            ; zero eax
;        push eax                ; preserve eax
;        push ebx                ; preserve ebx
;        push edi                ; preserve edi
;        push esi                ; preserve esi       
; this part we do need:
       
        push ebp                ; prologue: set up stack frame
        mov ebp, esp

; if we needed to preserve registers, do it here

        xor eax, eax ; since we want the result in eax
        mov ecx, [ebp + 8] ; first (only) parameter
        .char_loop
                ; while the byte (char) being compared is not "0"
; for clarity: the byte we're looking for is the number zero
; not the character "0". They're not the same thing!
        mov dl, [ecx]
        cmp dl, byte 0
        jz .end_loop
        inc eax ; increase counter
        inc ecx ; move to next character
                ;       add one to ecx
                ;       jmp .char_loop
        jmp .char_loop
                ; if the byte (char) is "0" and no characters remaing (meaning null terminated)
                ;       jmp .end_loop

        .end_loop

; if we had preserved registers, pop 'em here

                mov esp, ebp            ; epilogue: restore caller's frame pointer
                pop ebp

                ret
 
; this stuff after "ret" would never be reached anyway
;               pop eax                 ; this is where the final return value is located
;                pop esi                 ; restore esi
;                pop edi                 ; restore edi
;                pop ebx                 ; restore ebx


That's untested. I should know better than to post untested code, but it's late here...

As you can see, I've added a "test main" so that you can assemble and link the code and run it. As you probably know, we can see the exit code by typing "echo $?". Only one byte is valid, but that should be enough for short strings. I think I've got it right, but no promises...

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #2 on: September 05, 2017, 02:46:12 AM »
Frank your advice worked perfectly, I compiled the program with the short "main" function and it is returning the length of "string1" as exit code!

Now I'm assuming that I don't need to leave in the
Code: [Select]
string1 db 'string', 0          ; null terminated string
string1_len equ $ - string1     ; length of string1
part of the code because this function should be used to examine any length string. Should I just delete those two lines or is anything else required to make this happen?


Why is "dl" used in 'mov dl, [ecx]' ? If I understand it, dl is the low order byte of the edx register, but how does edx and dl come into play here?

Thanks again!

Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #3 on: September 05, 2017, 03:26:16 AM »
Moving on to the next C function "strcmp"

Instructions:
int l_strcmp(char *str1, char *str2);
return 0 if str1 and str2 are equal, return 1 if they are not. Note that this is not the same definition as the C standard library function strcmp.

Here is my code so far:

Code: [Select]

bits 32

section .data

string1 db 'hello', 0
string2 db 'hello', 0
string3 db 'Hello!', 0


section .text

global l_strcmp

l_strcmp:

        push ebp                ; prologue: set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for storing result (0= equal, 1= not equal)
        mov ecx, [ebp + 8]      ; first parameter (string1) stored in ecx
        mov edx, [ebp + 12]     ; second parameter (string2) stored in edx

        .char_loop:
                ; code to compare every character in both strings
                mov cl, [ecx]           ; move the current character into the cl segment of ecx
                mov dl, [edx]           ; move the current character into the dl segment of edx
                cmp dl, cl
                jne .done_1                ; if char in string1 != string2, exit with result 1
               
                ; how to examine if the null terminator has been met and both strings match?               
                jmp .char_loop             ; continue examining characters


        .done_1:
               
                mov eax, 1              ; returns 1 when strings do not match
                mov esp, ebp            ; epilogue: restore caller's frame pointer
                pop ebp
                ret

        .done_0:

                mov eax, 0              ; returns 0 when strings do match
                mov esp, ebp
                pop ebp
                ret

A bit of misunderstanding on what "cl" and "dl" are doing.. need some clarification on that, as well as if the loops appear to operate properly.

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #4 on: September 05, 2017, 03:40:55 AM »
Good. What I found when I tried it was that Nasm burped up a couple of warnings about the lack of colons on a couple of labels. Just put colons on 'em, or that warning can be turned off.

The "string" can be considered part of the "test main". You don't need it in the final code.

The "dl" register was just someplace to put the single byte we're looking at. You don't need that, either. Could be done as:
Code: [Select]
cmp [ecx], byte 0
I was just trying to implement "something" from your pseudo-code. edx, and its "parts" dl and dh, are "volatile" according to the cdecl calling convention. We don't have to preserve it... so I used it.

We can get by without the stack frame, too. If we don't meddle with ebp, the first parameter is at [esp + 4]. We probably "should" use a stack frame, though - it allows a debugger to do a "back trace" to see where we were called from.

If you'll step into the museum for a moment... In 16-bit code, only bx and bp could be used for "base" registers. [sp + ?] was not a valid addressing mode. We had no choice but to...
Code: [Select]
push bp
mov bp, sp
mov ax, [bp + 4]
or whatever. 32-bit addressing modes are much more flexible - any register can be a "base" register, so we can use [esp + 4]. etc. Still, it is common to set up a stack frame...

If I'm feeling ambitious, I may work up a "super short" version of this. Probably not...

Best,
Frank


Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #5 on: September 05, 2017, 03:48:36 AM »
You're getting ahead of me... you Hare! :)

Quote
A bit of misunderstanding on what "cl" and "dl" are doing..
Since cl and dl are parts of ecx and edx, they're trashing ecx and edx so they no longer will point to your strings. Use some other 8-bit registers - al and ah, perhaps.

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #6 on: September 05, 2017, 03:54:54 AM »
^ I was thinking that as I was doing it, that edx and ecx would get messed up somehow. This assembly stuff is so strict but at the same time the wild wild west in how you want to handle data and instructions

So here is my final version of the strcmp:

Code: [Select]
bits 32

section .data

string1 db 'hello', 0
string2 db 'hello', 0
string3 db 'Hello!', 0


section .text

global l_strcmp

l_strcmp:

        push ebp                ; prologue: set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for storing result (0= equal, 1= not equal)
        mov ecx, [ebp + 8]      ; first parameter (string1) stored in ecx
        mov edx, [ebp + 12]     ; second parameter (string2) stored in edx

        .char_loop:
                ; code to compare every character in both strings
                cmp [ecx], [edx]        ; compare the characters in the ecx, edx registers
                jne .done_1             ; if char in string1 != string2, exit with result 1
                cmp ecx, byte 0         ; tests for null terminator
                je .done_0              ; jump to done if null terminator         
                jmp .char_loop             ; continue examining characters


        .done_1:
               
                mov eax, 1              ; returns 1 when strings do not match
                mov esp, ebp            ; epilogue: restore caller's frame pointer
                pop ebp
                ret

        .done_0:

                mov eax, 0              ; returns 0 when strings do match
                mov esp, ebp
                pop ebp
                ret

I feel I've built a suspension bridge on quicksand with this one

*OK so I just realize I forgot to increment ecx and edx. I would add:

Code: [Select]
inc ecx
inc edx

in the .char_loop between je.done_0 and jmp .char_loop
« Last Edit: September 05, 2017, 03:58:08 AM by turtle13 »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #7 on: September 05, 2017, 04:01:20 AM »
I don't think that'll even assemble, will it?

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #8 on: September 05, 2017, 04:09:59 AM »
nope, giving me an error with line "cmp [ecx], [edx]"

*just did it!! Returns 1 when strings are different, 0 when strings are the same!

Code: [Select]

bits 32

section .data

string1 db 'hello', 0
string2 db 'hello', 0
string3 db 'Hello!', 0


section .text

;-----------------------------------------------
; this is a "test main" it should not be in your final code
        global _start
        _start:
        push string1 ; address of string1
        push string3
        call l_strcmp
        add esp, 8 ; "remove" parameters
; length is returned in eax
; make it our exit code
        mov ebx, eax
        mov eax, 1 ; sys_exit
        int 0x80
; end of "test main"
;-------------------------------------------

global l_strcmp

l_strcmp:

        push ebp                ; prologue: set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for storing result (0= equal, 1= not equal)
        mov ecx, [ebp + 8]      ; first parameter (string1) stored in ecx
        mov edx, [ebp + 12]     ; second parameter (string2) stored in edx

        .char_loop:
                ; code to compare every character in both strings
                mov al, [ecx]
                mov ah, [edx]               
                cmp al, ah        ; compare the characters in the ecx, edx registers
                jne .done_1             ; if char in string1 != string2, exit with result 1
                cmp al, byte 0         ; tests for null terminator
                je .done_0              ; jump to done if null terminator
                inc ecx
                inc edx         
                jmp .char_loop             ; continue examining characters


        .done_1:
               
                mov eax, 1              ; returns 1 when strings do not match
                mov esp, ebp            ; epilogue: restore caller's frame pointer
                pop ebp
                ret

        .done_0:

                mov eax, 0              ; returns 0 when strings do match
                mov esp, ebp
                pop ebp
                ret

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #9 on: September 05, 2017, 04:53:12 AM »
There ya go!

Now... the real C "gets()" is notoriously unsafe. Some versions of gcc will warn if you try to use it. Instead, make the caller tell you how big the buffer is, and don't "get" any more than that. Please!

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #10 on: September 05, 2017, 05:27:49 AM »
No onto the l_gets:

instructions:

int l_gets(int fd, char *buf, int len);
read at most len bytes from file fd, placing them into buffer buf. Terminate early if a new line character ('\n', 0x0A) characters is read. If a new line character is encountered, it should be stored into the output buffer and counted in the total number of bytes read. Return the total number of bytes read (which may be zero if end of file is reached or an error occurs). This function does not place a null termination character after the last character read. That is the responsibility of the caller.


Here is some code I have so far, I just want to make sure I am setting it up correctly:


Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for storing return result

        mov ecx, [ebp + 8]      ; third parameter (int len) stored into ecx
        mov edx, [ebp + 12]     ; second parameter (char *buf) stored into edx
        mov esi, [ebp + 16]     ; first parameter (int fd) stored into esi

^since parameters for cdecl are stored right to left, that is why I am adding to the stack like that. Not sure if this is correct.

I'm lost as to where to/ how to store the buffer data. I should get the value for len, and loop that many times while writing the data to the buffer (which would be edx according to my code above)?
« Last Edit: September 05, 2017, 05:29:46 AM by turtle13 »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #11 on: September 05, 2017, 06:27:49 AM »
Looks remarkably like sys_read, does it not?

Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for storing return result
; going to need it for the system call number, no?

; going to need to preserve ebx
push ebx

        mov ecx, [ebp + 8]      ; third parameter (int len) stored into ecx
; fd - going to want it in ebx

        mov edx, [ebp + 12]     ; second parameter (char *buf) stored into edx
; going to want it in ecx

        mov esi, [ebp + 16]     ; first parameter (int fd) stored into esi
; max length - going to want it in edx

; now do your sys_read
; if error (eax negative) we want it to be zero
; that's what it says...
; otherwise number of characters - like sys_read

pop ebx

; epilogue...

That's how I understand it, anyway...

You may want to "flush" any excess the pesky user types...

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #12 on: September 05, 2017, 07:48:54 PM »
Here is what I came up with so far:

Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx

        mov eax, 3              ; sys call for read
        int 0x80

        ; read data onto stack:
        .buf_loop:
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop
                xor eax, eax            ; zero eax to be used for counter
                push ebx                ; push the character onto stack
                inc ebx                 ; advance to next character
                inc eax                 ; advance the counter
                cmp edx, eax
                je .done

        .loop1:
       

                cmp register, byte 0x0A           ; check for newline, exit loop if true
                je .done

        .done:
               

Hopefully the comments are enough to tell you about what I am trying to do here..

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #13 on: September 05, 2017, 10:28:41 PM »
Well... no...
Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx

        mov eax, 3              ; sys call for read
        int 0x80

Up to here, I follow you. In fact, it looks like you're about done...

Code: [Select]
        ; read data onto stack:
        .buf_loop:
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop
                xor eax, eax            ; zero eax to be used for counter
If you zero eax in the loop, it's going to run for a long time!
Code: [Select]
                push ebx                ; push the character onto stack
                inc ebx                 ; advance to next character
Last I knew, ebx was your file descriptor...

Code: [Select]
                inc eax                 ; advance the counter
                cmp edx, eax
                je .done
Fair enough... if you don't zero eax in the loop...
Code: [Select]
        .loop1:
       

                cmp register, byte 0x0A           ; check for newline, exit loop if true
                je .done

        .done:
I don't see where we "loop", and to where... The last part of it won't even assemble!

After the sys_read, your data's in the buffer that the caller specified, and eax holds bytes read, including the linefeed that ends input. At least that's true if we're reading from stdin. I'm less sure of how sys_read will behave on a "real file" (or, for that matter, if stdin is redirected). If it's a "text file", okay, but what if it's a "binary file"? Are we expected to stop at any number 10 we encounter? I think of "gets()" as being exclusively for stdin, but your assigned "l_gets" is apparently different. I may have to experiment and see what happens on a "real file"...

I mentioned up above that you might want to "flush" any excess. That would apply only to stdin.

Later,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #14 on: September 06, 2017, 02:04:54 AM »
The point of this assignment is to use these functions for our next assignment, which makes a socket call to a web server and downloads a .html or .txt file, so it is supposed to be reading plain text (no binary).

I played with it a little more.. I would like to use the stack as the buffer and push each byte onto the stack, and use esp as the pointer to the buffer.

Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx

        add esp, 12             ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack
       
        cmp edx, 0              ; if len is zero or less, exit program
        jle .done       
        ; read data onto stack:
        .buf_loop:
               
                mov eax, 3              ; sys call for read, to begin reading bytes
                int 0x80               
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop
                push esp

                add esp, 4              ; advance to next character
                inc eax                 ; advance the counter
                cmp edx, eax
                je .done

; ignore stuff below this for now
        .loop1:
       

                cmp register, byte 0x0A           ; check for newline, exit loop if true
                je .done

        .done:

is this making sense?