Author Topic: Help with writing custom C type string functions using NASM  (Read 28563 times)

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #15 on: September 06, 2017, 03:50:23 AM »
Well, no. This may be my fault. Shortly after my last post, I lost my computer, and it's taken me 'til just now to beat it into rebooting. I'm pretty frazzeled, and haven't looked at any of the stuff I said I'd look at.

You're pretty far from a socket call. You can read from a socket with sys_read, but recv is more common. I don't see where this is going at all.

You can use the stack for a buffer. You want to subtract something from esp, not add it. You almost certainly do not want to use esp as a pointer into it.
Code: [Select]
push esp
add esp, 4
gets you right back where you were before pushing esp. It may be my confused mental state, but I don't see what you're trying to accomplish here. You can not push a byte. You can push a word, but you don't usually want to.

You can read bytes one at a time with sys_read. It'll be slower than a gut-shot wolf bitch with nine suckling pups dragging a number nine trap uphill in a snowstorm, but you can do it. I don't see the point, when sys_read does exactly what your assignment describes.

Let me try to get myself organized and see if I can get back into this.

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #16 on: September 06, 2017, 06:10:46 AM »
Frank. you are killing me here with your colloquialisms  ;D ;D I needed the laugh after the stress this class is putting me through

So I cleaned up the code a bit and I think got it closer to where I need to be:

Code: [Select]
bits 32

section .data



section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx
       
        cmp edx, 0              ; if len is zero or less, exit program
        jle .done       
        ; read data onto stack:
        .buf_loop:
   
                push esp
                cmp esp, byte 0x0A         ; if a newline character is encountered, exit
                je .done       
                sub esp, 4              ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack           
                mov eax, 3              ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len)
                int 0x80               
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop

                inc eax                 ; advance the counter
                cmp edx, eax
                je .done
                jmp .buf_loop           ; continue


        .done:
                mov eax, 1              ; sys exit
                mov ebx, edx            ; return value is number of bytes written (len)
                int 0x80

Maybe you could help me write a test main like you did before so I can test this one out? It is at least "compiling" (or do we say assembling?) fine.
« Last Edit: September 06, 2017, 06:13:24 AM by turtle13 »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #17 on: September 06, 2017, 09:08:24 AM »
We say "assembling". It is correct to call it "compiling" (an assembler is a compiler for assembly language)... but the asm-heads will think you're a newbie if you do.

This is a bare minimum. I commented out a lot of your code. You do a bunch of stuff with esp before you even read anything. I moved the exit code up into the "test main" where it belongs. Your caller is not likely to be pleased if you exit in the middle of the function. The function wants an epilogue and return (with the count in eax... where sys_read puts it).

It only tests reading from stdin. In order to test with a file, we'd need a file descriptor. We could put a sys_open in the test main, or... you're going to write one, right?

You seem to be very concerned with finding that linefeed. It's at [ecx + eax -1] - sys_read stops when it sees it. If it's not there, the pesky user has typed more than we had room for... and we probably should "flush" that. If we don't, it'll screw up the next read... which may be after the program has exited! Try it! Type 10 characters and "ls" before you hit "enter" and see what happens. "ls" is harmless, but "rm ." is not!

Reading from a real file won't do that. It will stop after edx bytes and there's no "keyboard buffer". It "may" stop earlier if it sees a linefeed. I kinda don't think so, but I need to try it. If you're going to be reading from a socket, what you'll actually see is carriage return linefeed pairs. http is fussy about that! It would make a lot of sense to keep reading past them until we've got the whole file or a full buffer. But the assignment seems to say it wants us to stop...

Code: [Select]
; Nasm -f elf32 l_gets.asm -d TESTMAIN
; ld -o l_gets l_gets.o

bits 32

%ifdef TESTMAIN
section .data

section .bss
buf resb 10

section .text
global _start
_start:

; test a call to it

push 10 ; length
push buf
push 0 ; stdin
call l_gets
add esp, 4 * 3

;print what we l_getsed

mov eax, 4  ;sys_write
mov ebx, 1 ; stdout
mov ecx, buf
mov edx, 10
int 80h

exit:
                mov eax, 1              ; sys exit
                mov ebx, edx            ; return value is number of bytes written (len)
                int 0x80
%endif
;-----------------------------
section .text

global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

;        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx
       
        cmp edx, 0              ; if len is zero or less, exit program
        jle .done       
        ; read data onto stack:
        .buf_loop:
   
;                push esp
;                cmp esp, byte 0x0A         ; if a newline character is encountered, exit
;                je .done       
;                sub esp, 4              ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack           


                mov eax, 3              ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len)
                int 0x80               
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop

;                inc eax                 ; advance the counter
;                cmp edx, eax
;                je .done
;                jmp .buf_loop           ; continue


        .done:
pop ebx ; restore caller's reg
; epilogue
mov esp, ebp
pop ebp
ret
Tested (a little) but incomplete...

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #18 on: September 06, 2017, 06:12:05 PM »
Should I not leave in the last part of the .buf_loop:

Code: [Select]

                inc eax                 ; advance the counter
                cmp edx, eax
                je .done
                jmp .buf_loop           ; continue


Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #19 on: September 06, 2017, 07:00:03 PM »
I don't see why. After your sys_read, eax is the number of bytes actually read. edx is the maximum number to read. You're counting the number of bytes not read? There's something you're trying to do with this ".buf_loop" that I'm not getting. Quite possibly my fault...

Put it back in and see what it does, if you like...

Best,
Frank


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #20 on: September 06, 2017, 08:06:21 PM »
- How do you declare a variable that takes in a dynamic sized amount (for buf).. because right now buf is only big enough for 10 characters
- The program requirement is for it to stop running when it reaches a new line feed, I think this is for parsing in subsequent networking assignment. Why is the newline character at [ecx + eax -1]?
- Why are we only preserving and restoring ebx and not ecx or edx?
« Last Edit: September 06, 2017, 08:30:30 PM by turtle13 »

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #21 on: September 06, 2017, 08:38:28 PM »
You can get more memory with sys_brk... but you shouldn't have to worry about that. It's the caller's responsibility to provide the buffer and tell you how big it is. To get more buffer in "test main" just make it bigger.

The linefeed's at [ecx + eax -1] 'cause that's where it is. sys_read returns the number of bytes read in eax, and it stops when (doesn't stop until) it gets the linefeed. ecx, of course, is the beginning of the buffer.

As we've discussed, this is (probably) going to be different with a real file... or a socket. If the assignment really requires it, you could read one byte at a time (put 1 in edx and ignore what the caller tells you). Makes more sense to simply find it in the buffer...

Best,
Frank



Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #22 on: September 06, 2017, 09:09:17 PM »
Will this code do what I need it to (it works using test main and preset buffer but if it is trying to take in a variable amount of text, will it do that and return the number of bytes read as exit code: even if a newline character is located before the amount of "len" ?


Code: [Select]
global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx
       
        cmp edx, 0              ; if len is zero or less, exit program
        jle .done       

        cmp ecx, [ecx + eax -1] ; compare curremt character to new line, exit if true
        je .done
        ; read data onto stack:

        mov eax, 3              ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len)
        int 0x80               



        .done:

                pop ebx                 ; restore the caller's ebx
                mov esp, ebp            ; epilogue- clean up the stack
                pop ebp
                ret

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #23 on: September 06, 2017, 09:25:35 PM »
Something like this?
Code: [Select]
bits 32

%ifdef TESTMAIN
section .data

section .bss
buf resb 80

section .data
filename db `l_gets.asm\0` ; ourself - we know it's there


section .text
global _start
_start:

mov eax, 5 ; sys_open
mov ebx, filename
xor ecx, ecx
xor edx, edx
int 80h
test eax, eax
js exit ; bail out if error

; test a call to it

push 80 ; length
push buf
push eax ; fd
call l_gets
add esp, 4 * 3

; find NL - we want to stop there
xor edx, edx
find:
cmp [ecx + edx], byte 10
je found
inc edx
jmp find
found:
; edx is length to print

;print what we l_getsed

; mov edx, eax ; length read
mov eax, 4  ;sys_write
mov ebx, 1 ; stdout
mov ecx, buf
int 80h

exit:
                mov eax, 1              ; sys exit
                mov ebx, edx            ; return value is number of bytes written (len)
                int 0x80
%endif
;-----------------------------


global l_gets

l_gets:
        push ebp                ; prologue, set up stack frame
        mov ebp, esp

;        xor eax, eax            ; zero eax to prepare for syscall #
        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter goes into ebx
        mov ecx, [ebp + 12]     ; char *buf stored into ecx
        mov edx, [ebp + 16]     ; len stored into edx
       
        cmp edx, 0              ; if len is zero or less, exit program
        jle .done       
        ; read data onto stack:
        .buf_loop:
   
;                push esp
;                cmp esp, byte 0x0A         ; if a newline character is encountered, exit
;                je .done       
;                sub esp, 4              ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack           
                mov eax, 3              ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len)
                int 0x80               
                ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop

;                inc eax                 ; advance the counter
;                cmp edx, eax
;                je .done
;                jmp .buf_loop           ; continue


        .done:
pop ebx ; restore caller's reg

mov esp, ebp
pop ebp
ret

Could put the "find linefeed" part in the function. That may be what the assignment expects?

Best,
Frank

Ahhh, just read your latest. I think your code will work reading from stdin. (ahhh, no it won't - ecx is not likely 0xA - especially before the sys_read!) This tests from a real file. As I expected, it reads past the linefeed. Comment out the "find LF" part to see what it does without it...



Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #24 on: September 07, 2017, 10:06:01 PM »
Frank,

At this point my brain is feeling clobbered and I want to break this simple function down step by step.

First off, I would just like to write the part of the code that pushes each byte onto the stack to be read.

Would I implement this as a something like:

Code: [Select]
sys read
push buf
pop ecx
add [buf], 4
inc counter
if counter == len OR if ecx == 0xOA
jmp done

Is this the correct way? I am having a tough time understanding how the stack can be used to store one byte at a time as a kind of "buffer," but my professor says that it can be done. He isn't too keen on giving us the exact code on how to do it, however.

We're gonna have to pop each byte as it is pushed onto the stack, correct?
« Last Edit: September 07, 2017, 10:28:20 PM by turtle13 »

Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #25 on: September 07, 2017, 10:49:59 PM »
Here is a stack diagram I drew to try to visualize how this code is working:

Code: [Select]

bits 32

section .text

l_gets:
        push ebp        ; prologue
        mov ebp, esp
        push ebx

        mov ebx, [ebp + 8]        ; set up registers on stack for args
        mov ecx, [ebp + 12]
        mov edx, [ebp + 16]

        cmp edx, 0
        jle .done

        .read_loop:
                mov eax, 3    ; sys call read
                int 0x80

                push buf        ; push buf onto a stack because this is where each character is going to be stored
                pop ecx         ; pops the current character in ecx into buf
                ; this part I am getting hung up on, how to increment to the next character to be read in from whatever file is coming in

                cmp edx, char_count ; jump if number of bytes read has reached “len”
je .done

        .done:
                pop ebx
                mov esp, ebp        ; epilogue
                pop ebp
                ret

« Last Edit: September 07, 2017, 11:56:38 PM by turtle13 »

Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #26 on: September 08, 2017, 04:29:33 AM »
i was able to write this functioning l_gets:

Code: [Select]
bits 32

section .text

global l_gets

l_gets:
        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter into ebx
        mov ecx, [ebp + 12]     ; char *buf in ecx
        mov edx, [ebp + 16]     ; len in edx

        cmp edx, 0              ; if len zero or less, exit
        jle .done

        mov eax, 3              ; sys read
        int 0x80
       
        .done:
                pop ebx                 ; restore ebx
                mov esp, ebp            ; epilogue
                pop ebp
                ret

When using your testmain though, you move 10 to edx (for 10 characters). Even though I type less than 10 characters into stdin, i always get a return value of 10.. the program should return only the number of bytes read. This and figuring out how to stop input after the newline character is read are all that's left and i'm pretty much done with this one. Onto the next..

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Help with writing custom C type string functions using NASM
« Reply #27 on: September 08, 2017, 04:37:25 AM »
Ahhh... your instructor is apparently not explaining this well. I probably won't explain it well, either. Such is life...

You can indeed put one byte on the stack, but forget about "push" and "pop" to do so. First, let us create a "local" variable on the stack to use as a "buffer":
Code: [Select]
func:
    push ebp
    mov ebp, esp
    sub esp, 64
    ...

The amount we subtract from esp should be a multiple of 4, just to keep the stack well aligned. It could be more (or less) but probably shouldn't be more than 4096 (a "page"). So we've got a local variable that we can use as a buffer (or anything else). Its address is ebp - 64. That's the address of the first (lowest) byte, as usual.

I guess what you're trying to do is to read one byte at a time into this buffer - instead of the buffer the caller has provided? If so, we ignore the buffer the caller has provided, and the length. We still need the file descriptor...
Code: [Select]
func:
    push ebp
    mov ebp, esp
    sub esp, 64
    mov ebx, [ebp + 8]
    lea ecx, [ebp - 64]
    xor esi, esi ; to use as counter
    mov edx, 1 ; read just one byte
.top:
    mov eax, 3 ; sys_read
    int 0x80
    cmp [ecx], byte 0xA ; linefeed?
    je .done
    inc ecx ; next byte in buffer
    inc esi ; count it
    cmp esi, 64 ; we're out of local buffer
    je .done
    cmp esi, [ebp + 16] ; caller's idea of length
    je .done
    jmp .top ; go get another one
.done:
    ; do something intelligent...
    ; clean up and go home
We may need more than one ".done:"  label here... I don't think I'm too fond of this approach.

I think what I would rather do is a perfectly ordinary sys_read into the buffer the caller provides for the length the caller provides. Then... if the file descriptor is stdin, make sure that we do have that linefeed. If not, read into a dummy buffer until we find it to flush the keyboard buffer. This is complicated by the fact that *nix is fairly tolerant about file descriptors. If we try to read from stdout or even stderr, we still read from the keyboard. I'm not sure what happens with stdaux. If the file descriptor is 4 or greater, we presumably have a disk file or socket. This will read past the linefeed. In the case of a disk file, we could seek back to where the linefeed was found. I don't think this will work on a socket. We may have no choice but to read one byte at a time. I think I'd still read into the buffer the caller provides...

Quote
We're gonna have to pop each byte as it is pushed onto the stack, correct?

I don't think so. Why push it at all if you're only going to pop it again? I'm still not sure I understand what you're trying to do here.

Your diagram looks pretty good except that you've left out the return address (below the parameters and above old ebp). Also, esp points at what  you pushed last, not below it yet...

Code: [Select]
push buf
pushes the address of buf - probably not useful. You could push four bytes and ignore all but one. To increment an address, you want to put it in a register and increment that.
Code: [Select]
; if ecx is caller's buffer...
lea edi, [ebp - 64] ; our buffer on stack
.top:
mov al, [ecx]
mov [edi], al
inc edi ; or use stosb
inc ecx
jmp .top
That will need an exit condition, of course. I've ignored preserving registers for now. Beside ebx (used by system calls) we'll probably need both esi and edi. That can be added - just trying to keep it simple. Doubt if I've succeeded...

Best,
Frank

next post: probably my fault you're getting the bad return value. Check closely what I've done...


Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #28 on: September 08, 2017, 05:31:48 AM »
Thanks, some things make sense now, others still don't.

Here's what I have so far. I'm not spending any time on this one tonight.

Code: [Select]
global l_gets

l_gets:
        push ebp                ; prologue
        mov ebp, esp

        push ebx                ; preserve ebx

        mov ebx, [ebp + 8]      ; fd parameter into ebx
        mov ecx, [ebp + 12]     ; char *buf in ecx
        mov edx, [ebp + 16]     ; len in edx
        xor esi, esi            ; counter

        cmp edx, 0              ; if len zero or less, exit
        jle .done

        .char_loop:       
       
                mov eax, 3              ; sys read
                int 0x80

                cmp [ecx], byte 0xA     ; test for linefeed
                je .done

                inc ecx                 ; advance to next byte
                inc esi                 ; +count

                cmp esi, [ebp + 16]     ; does read bytes = len?
                je .done

                jmp .char_loop

               
       
        .done:
                mov eax, esi            ; # bytes read into eax               
                pop ebx                 ; restore ebx
                mov esp, ebp            ; epilogue
                pop ebp
                ret

Offline turtle13

  • Jr. Member
  • *
  • Posts: 73
Re: Help with writing custom C type string functions using NASM
« Reply #29 on: September 08, 2017, 05:40:14 PM »
These additional functions I need to complete:

void l_puts(const char *buf);
write the contents of the null terminated string buf to stdout. The null byte must not be written. If the length of the string is zero, then no bytes are to be written.

int l_write(int fd, char *buf, int len);
write len bytes from buffer buf to file fd. Return the number of bytes actually written or -1 if an error occurs.

int l_open(const char *name, int flags, int mode);
opens the named file with the supplied flags and mode. Returns the integer file descriptor of the newly opened file or -1 if the file can't be opened.

int l_close(int fd);
close the indicated file, returns 0 on success or -1 on failure.

int l_exit(int rc);
terminate the calling program with exit code rc.



Can you maybe help to "guide" as far as how I should set these up so I can work on these this afternoon? Thank you again