NASM - The Netwide Assembler

NASM Forum => Programming with NASM => Topic started by: turtle13 on September 04, 2017, 06:50:15 AM

Title: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 04, 2017, 06:50:15 AM: For a class assignment I must write custom versions of C type functions such as:

strlen, strcmp, gets, puts, write, open, close, exit

From what I understand, this requires using the cdecl calling convention so I will be preserving and restoring the ebx, edi, esi, and ebp registers, and the caller will clean the stack. eax holds the return value.

So far I have some skeleton that I have begun for the strlen (called l_strlen) type function (which returns the int. value of the number of characters in a given string):

Code: [Select]
bits 32 section .data ; variables go here: ; var_name db values string1 db 'string', 0 ; null terminated string string1_len equ $ - string1 ; length of string1 section .text global l_strlen l_strlen: xor eax, eax ; zero eax push eax ; preserve eax push ebx ; preserve ebx push edi ; preserve edi push esi ; preserve esi push ebp ; prologue: set up stack frame mov ebp, esp .char_loop ; while the byte (char) being compared is not "0" ; add one to ecx ; jmp .char_loop ; if the byte (char) is "0" and no characters remaing (meaning null terminated) ; jmp .end_loop .end_loop mov esp, ebp ; epilogue: restore caller's frame pointer pop ebp ret pop eax ; this is where the final return value is located pop esi ; restore esi pop edi ; restore edi pop ebx ; restore ebx
Questions about my code:

- for the .char_loop I have pseudocode, I'm trying to figure out exactly how to accomplish this task (or if the task is even appropriate?)

- How do I manipulate the code so that the string being measured is not statically declared like I did with variable 'string1' (such that 'l_strlen(any_string)') ?

- anything else that seems off to you (or better yet, is anything even correct?)
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 04, 2017, 08:28:43 AM: It's pretty much correct, but you've got some things out of order.

You don't need to "preserve" registers that you don't use. For this simple task, we can get by with registers that we're allowed to alter. That simplifies things. The "prologue", as the name suggests, wants to be the first thing in your function...

Code: [Select]
bits 32 section .data ; variables go here: ; var_name db values string1 db 'string', 0 ; null terminated string string1_len equ $ - string1 ; length of string1 section .text ;----------------------------------------------- ; this is a "test main" it should not be in your final code global _start _start: push string1 ; address of string1 call l_strlen add esp, 4 ; "remove" parameter ; length is returned in eax ; make it our exit code mov ebx, eax mov eax, 1 ; sys_exit int 0x80 ; end of "test main" ;------------------------------------------- global l_strlen l_strlen: ; xor eax, eax ; zero eax ; push eax ; preserve eax ; push ebx ; preserve ebx ; push edi ; preserve edi ; push esi ; preserve esi ; this part we do need: push ebp ; prologue: set up stack frame mov ebp, esp ; if we needed to preserve registers, do it here xor eax, eax ; since we want the result in eax mov ecx, [ebp + 8] ; first (only) parameter .char_loop ; while the byte (char) being compared is not "0" ; for clarity: the byte we're looking for is the number zero ; not the character "0". They're not the same thing! mov dl, [ecx] cmp dl, byte 0 jz .end_loop inc eax ; increase counter inc ecx ; move to next character ; add one to ecx ; jmp .char_loop jmp .char_loop ; if the byte (char) is "0" and no characters remaing (meaning null terminated) ; jmp .end_loop .end_loop ; if we had preserved registers, pop 'em here mov esp, ebp ; epilogue: restore caller's frame pointer pop ebp ret ; this stuff after "ret" would never be reached anyway ; pop eax ; this is where the final return value is located ; pop esi ; restore esi ; pop edi ; restore edi ; pop ebx ; restore ebx
That's untested. I should know better than to post untested code, but it's late here...

As you can see, I've added a "test main" so that you can assemble and link the code and run it. As you probably know, we can see the exit code by typing "echo $?". Only one byte is valid, but that should be enough for short strings. I think I've got it right, but no promises...

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 02:46:12 AM: Frank your advice worked perfectly, I compiled the program with the short "main" function and it is returning the length of "string1" as exit code!

Now I'm assuming that I don't need to leave in the
Code: [Select]
string1 db 'string', 0 ; null terminated string string1_len equ $ - string1 ; length of string1 part of the code because this function should be used to examine any length string. Should I just delete those two lines or is anything else required to make this happen?

Why is "dl" used in 'mov dl, [ecx]' ? If I understand it, dl is the low order byte of the edx register, but how does edx and dl come into play here?

Thanks again!
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 03:26:16 AM: Moving on to the next C function "strcmp"

Instructions:
int l_strcmp(char *str1, char *str2);
return 0 if str1 and str2 are equal, return 1 if they are not. Note that this is not the same definition as the C standard library function strcmp.

Here is my code so far:

Code: [Select]
bits 32 section .data string1 db 'hello', 0 string2 db 'hello', 0 string3 db 'Hello!', 0 section .text global l_strcmp l_strcmp: push ebp ; prologue: set up stack frame mov ebp, esp xor eax, eax ; zero eax to prepare for storing result (0= equal, 1= not equal) mov ecx, [ebp + 8] ; first parameter (string1) stored in ecx mov edx, [ebp + 12] ; second parameter (string2) stored in edx .char_loop: ; code to compare every character in both strings mov cl, [ecx] ; move the current character into the cl segment of ecx mov dl, [edx] ; move the current character into the dl segment of edx cmp dl, cl jne .done_1 ; if char in string1 != string2, exit with result 1 ; how to examine if the null terminator has been met and both strings match? jmp .char_loop ; continue examining characters .done_1: mov eax, 1 ; returns 1 when strings do not match mov esp, ebp ; epilogue: restore caller's frame pointer pop ebp ret .done_0: mov eax, 0 ; returns 0 when strings do match mov esp, ebp pop ebp ret
A bit of misunderstanding on what "cl" and "dl" are doing.. need some clarification on that, as well as if the loops appear to operate properly.
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 03:40:55 AM: Good. What I found when I tried it was that Nasm burped up a couple of warnings about the lack of colons on a couple of labels. Just put colons on 'em, or that warning can be turned off.

The "string" can be considered part of the "test main". You don't need it in the final code.

The "dl" register was just someplace to put the single byte we're looking at. You don't need that, either. Could be done as:
Code: [Select]
cmp [ecx], byte 0I was just trying to implement "something" from your pseudo-code. edx, and its "parts" dl and dh, are "volatile" according to the cdecl calling convention. We don't have to preserve it... so I used it.

We can get by without the stack frame, too. If we don't meddle with ebp, the first parameter is at [esp + 4]. We probably "should" use a stack frame, though - it allows a debugger to do a "back trace" to see where we were called from.

If you'll step into the museum for a moment... In 16-bit code, only bx and bp could be used for "base" registers. [sp + ?] was not a valid addressing mode. We had no choice but to...
Code: [Select]
push bp mov bp, sp mov ax, [bp + 4]or whatever. 32-bit addressing modes are much more flexible - any register can be a "base" register, so we can use [esp + 4]. etc. Still, it is common to set up a stack frame...

If I'm feeling ambitious, I may work up a "super short" version of this. Probably not...

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 03:48:36 AM: You're getting ahead of me... you Hare! :)

Quote
A bit of misunderstanding on what "cl" and "dl" are doing..
Since cl and dl are parts of ecx and edx, they're trashing ecx and edx so they no longer will point to your strings. Use some other 8-bit registers - al and ah, perhaps.

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 03:54:54 AM: ^ I was thinking that as I was doing it, that edx and ecx would get messed up somehow. This assembly stuff is so strict but at the same time the wild wild west in how you want to handle data and instructions

So here is my final version of the strcmp:

Code: [Select]
bits 32 section .data string1 db 'hello', 0 string2 db 'hello', 0 string3 db 'Hello!', 0 section .text global l_strcmp l_strcmp: push ebp ; prologue: set up stack frame mov ebp, esp xor eax, eax ; zero eax to prepare for storing result (0= equal, 1= not equal) mov ecx, [ebp + 8] ; first parameter (string1) stored in ecx mov edx, [ebp + 12] ; second parameter (string2) stored in edx .char_loop: ; code to compare every character in both strings cmp [ecx], [edx] ; compare the characters in the ecx, edx registers jne .done_1 ; if char in string1 != string2, exit with result 1 cmp ecx, byte 0 ; tests for null terminator je .done_0 ; jump to done if null terminator jmp .char_loop ; continue examining characters .done_1: mov eax, 1 ; returns 1 when strings do not match mov esp, ebp ; epilogue: restore caller's frame pointer pop ebp ret .done_0: mov eax, 0 ; returns 0 when strings do match mov esp, ebp pop ebp ret
I feel I've built a suspension bridge on quicksand with this one

*OK so I just realize I forgot to increment ecx and edx. I would add:

Code: [Select]
inc ecx inc edx
in the .char_loop between je.done_0 and jmp .char_loop
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 04:01:20 AM: I don't think that'll even assemble, will it?

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 04:09:59 AM: nope, giving me an error with line "cmp [ecx], [edx]"

*just did it!! Returns 1 when strings are different, 0 when strings are the same!

Code: [Select]
bits 32 section .data string1 db 'hello', 0 string2 db 'hello', 0 string3 db 'Hello!', 0 section .text ;----------------------------------------------- ; this is a "test main" it should not be in your final code global _start _start: push string1 ; address of string1 push string3 call l_strcmp add esp, 8 ; "remove" parameters ; length is returned in eax ; make it our exit code mov ebx, eax mov eax, 1 ; sys_exit int 0x80 ; end of "test main" ;------------------------------------------- global l_strcmp l_strcmp: push ebp ; prologue: set up stack frame mov ebp, esp xor eax, eax ; zero eax to prepare for storing result (0= equal, 1= not equal) mov ecx, [ebp + 8] ; first parameter (string1) stored in ecx mov edx, [ebp + 12] ; second parameter (string2) stored in edx .char_loop: ; code to compare every character in both strings mov al, [ecx] mov ah, [edx] cmp al, ah ; compare the characters in the ecx, edx registers jne .done_1 ; if char in string1 != string2, exit with result 1 cmp al, byte 0 ; tests for null terminator je .done_0 ; jump to done if null terminator inc ecx inc edx jmp .char_loop ; continue examining characters .done_1: mov eax, 1 ; returns 1 when strings do not match mov esp, ebp ; epilogue: restore caller's frame pointer pop ebp ret .done_0: mov eax, 0 ; returns 0 when strings do match mov esp, ebp pop ebp ret
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 04:53:12 AM: There ya go!

Now... the real C "gets()" is notoriously unsafe. Some versions of gcc will warn if you try to use it. Instead, make the caller tell you how big the buffer is, and don't "get" any more than that. Please!

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 05:27:49 AM: No onto the l_gets:

instructions:

int l_gets(int fd, char *buf, int len);
read at most len bytes from file fd, placing them into buffer buf. Terminate early if a new line character ('\n', 0x0A) characters is read. If a new line character is encountered, it should be stored into the output buffer and counted in the total number of bytes read. Return the total number of bytes read (which may be zero if end of file is reached or an error occurs). This function does not place a null termination character after the last character read. That is the responsibility of the caller.

Here is some code I have so far, I just want to make sure I am setting it up correctly:

Code: [Select]
bits 32 section .data section .text global l_gets l_gets: push ebp ; prologue, set up stack frame mov ebp, esp xor eax, eax ; zero eax to prepare for storing return result mov ecx, [ebp + 8] ; third parameter (int len) stored into ecx mov edx, [ebp + 12] ; second parameter (char *buf) stored into edx mov esi, [ebp + 16] ; first parameter (int fd) stored into esi
^since parameters for cdecl are stored right to left, that is why I am adding to the stack like that. Not sure if this is correct.

I'm lost as to where to/ how to store the buffer data. I should get the value for len, and loop that many times while writing the data to the buffer (which would be edx according to my code above)?
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 06:27:49 AM: Looks remarkably like sys_read, does it not?

Code: [Select]
bits 32 section .data section .text global l_gets l_gets: push ebp ; prologue, set up stack frame mov ebp, esp xor eax, eax ; zero eax to prepare for storing return result ; going to need it for the system call number, no? ; going to need to preserve ebx push ebx mov ecx, [ebp + 8] ; third parameter (int len) stored into ecx ; fd - going to want it in ebx mov edx, [ebp + 12] ; second parameter (char *buf) stored into edx ; going to want it in ecx mov esi, [ebp + 16] ; first parameter (int fd) stored into esi ; max length - going to want it in edx ; now do your sys_read ; if error (eax negative) we want it to be zero ; that's what it says... ; otherwise number of characters - like sys_read pop ebx ; epilogue...
That's how I understand it, anyway...

You may want to "flush" any excess the pesky user types...

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 05, 2017, 07:48:54 PM: Here is what I came up with so far:

Code: [Select]
bits 32 section .data section .text global l_gets l_gets: push ebp ; prologue, set up stack frame mov ebp, esp xor eax, eax ; zero eax to prepare for syscall # push ebx ; preserve ebx mov ebx, [ebp + 8] ; fd parameter goes into ebx mov ecx, [ebp + 12] ; char *buf stored into ecx mov edx, [ebp + 16] ; len stored into edx mov eax, 3 ; sys call for read int 0x80 ; read data onto stack: .buf_loop: ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop xor eax, eax ; zero eax to be used for counter push ebx ; push the character onto stack inc ebx ; advance to next character inc eax ; advance the counter cmp edx, eax je .done .loop1: cmp register, byte 0x0A ; check for newline, exit loop if true je .done .done:
Hopefully the comments are enough to tell you about what I am trying to do here..
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 05, 2017, 10:28:41 PM: Well... no...
Code: [Select]
bits 32 section .data section .text global l_gets l_gets: push ebp ; prologue, set up stack frame mov ebp, esp xor eax, eax ; zero eax to prepare for syscall # push ebx ; preserve ebx mov ebx, [ebp + 8] ; fd parameter goes into ebx mov ecx, [ebp + 12] ; char *buf stored into ecx mov edx, [ebp + 16] ; len stored into edx mov eax, 3 ; sys call for read int 0x80
Up to here, I follow you. In fact, it looks like you're about done...

Code: [Select]
; read data onto stack: .buf_loop: ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop xor eax, eax ; zero eax to be used for counterIf you zero eax in the loop, it's going to run for a long time!
Code: [Select]
push ebx ; push the character onto stack inc ebx ; advance to next characterLast I knew, ebx was your file descriptor...

Code: [Select]
inc eax ; advance the counter cmp edx, eax je .doneFair enough... if you don't zero eax in the loop...
Code: [Select]
.loop1: cmp register, byte 0x0A ; check for newline, exit loop if true je .done .done:I don't see where we "loop", and to where... The last part of it won't even assemble!

After the sys_read, your data's in the buffer that the caller specified, and eax holds bytes read, including the linefeed that ends input. At least that's true if we're reading from stdin. I'm less sure of how sys_read will behave on a "real file" (or, for that matter, if stdin is redirected). If it's a "text file", okay, but what if it's a "binary file"? Are we expected to stop at any number 10 we encounter? I think of "gets()" as being exclusively for stdin, but your assigned "l_gets" is apparently different. I may have to experiment and see what happens on a "real file"...

I mentioned up above that you might want to "flush" any excess. That would apply only to stdin.

Later,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 06, 2017, 02:04:54 AM: The point of this assignment is to use these functions for our next assignment, which makes a socket call to a web server and downloads a .html or .txt file, so it is supposed to be reading plain text (no binary).

I played with it a little more.. I would like to use the stack as the buffer and push each byte onto the stack, and use esp as the pointer to the buffer.

Code: [Select]
bits 32 section .data section .text global l_gets l_gets: push ebp ; prologue, set up stack frame mov ebp, esp xor eax, eax ; zero eax to prepare for syscall # push ebx ; preserve ebx mov ebx, [ebp + 8] ; fd parameter goes into ebx mov ecx, [ebp + 12] ; char *buf stored into ecx mov edx, [ebp + 16] ; len stored into edx add esp, 12 ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack cmp edx, 0 ; if len is zero or less, exit program jle .done ; read data onto stack: .buf_loop: mov eax, 3 ; sys call for read, to begin reading bytes int 0x80 ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop push esp add esp, 4 ; advance to next character inc eax ; advance the counter cmp edx, eax je .done ; ignore stuff below this for now .loop1: cmp register, byte 0x0A ; check for newline, exit loop if true je .done .done:
is this making sense?
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 06, 2017, 03:50:23 AM: Well, no. This may be my fault. Shortly after my last post, I lost my computer, and it's taken me 'til just now to beat it into rebooting. I'm pretty frazzeled, and haven't looked at any of the stuff I said I'd look at.

You're pretty far from a socket call. You can read from a socket with sys_read, but recv is more common. I don't see where this is going at all.

You can use the stack for a buffer. You want to subtract something from esp, not add it. You almost certainly do not want to use esp as a pointer into it.
Code: [Select]
push esp add esp, 4gets you right back where you were before pushing esp. It may be my confused mental state, but I don't see what you're trying to accomplish here. You can not push a byte. You can push a word, but you don't usually want to.

You can read bytes one at a time with sys_read. It'll be slower than a gut-shot wolf bitch with nine suckling pups dragging a number nine trap uphill in a snowstorm, but you can do it. I don't see the point, when sys_read does exactly what your assignment describes.

Let me try to get myself organized and see if I can get back into this.

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 06, 2017, 06:10:46 AM: Frank. you are killing me here with your colloquialisms ;D ;D I needed the laugh after the stress this class is putting me through

So I cleaned up the code a bit and I think got it closer to where I need to be:

Code: [Select]
bits 32 section .data section .text global l_gets l_gets: push ebp ; prologue, set up stack frame mov ebp, esp xor eax, eax ; zero eax to prepare for syscall # push ebx ; preserve ebx mov ebx, [ebp + 8] ; fd parameter goes into ebx mov ecx, [ebp + 12] ; char *buf stored into ecx mov edx, [ebp + 16] ; len stored into edx cmp edx, 0 ; if len is zero or less, exit program jle .done ; read data onto stack: .buf_loop: push esp cmp esp, byte 0x0A ; if a newline character is encountered, exit je .done sub esp, 4 ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack mov eax, 3 ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len) int 0x80 ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop inc eax ; advance the counter cmp edx, eax je .done jmp .buf_loop ; continue .done: mov eax, 1 ; sys exit mov ebx, edx ; return value is number of bytes written (len) int 0x80
Maybe you could help me write a test main like you did before so I can test this one out? It is at least "compiling" (or do we say assembling?) fine.
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 06, 2017, 09:08:24 AM: We say "assembling". It is correct to call it "compiling" (an assembler is a compiler for assembly language)... but the asm-heads will think you're a newbie if you do.

This is a bare minimum. I commented out a lot of your code. You do a bunch of stuff with esp before you even read anything. I moved the exit code up into the "test main" where it belongs. Your caller is not likely to be pleased if you exit in the middle of the function. The function wants an epilogue and return (with the count in eax... where sys_read puts it).

It only tests reading from stdin. In order to test with a file, we'd need a file descriptor. We could put a sys_open in the test main, or... you're going to write one, right?

You seem to be very concerned with finding that linefeed. It's at [ecx + eax -1] - sys_read stops when it sees it. If it's not there, the pesky user has typed more than we had room for... and we probably should "flush" that. If we don't, it'll screw up the next read... which may be after the program has exited! Try it! Type 10 characters and "ls" before you hit "enter" and see what happens. "ls" is harmless, but "rm ." is not!

Reading from a real file won't do that. It will stop after edx bytes and there's no "keyboard buffer". It "may" stop earlier if it sees a linefeed. I kinda don't think so, but I need to try it. If you're going to be reading from a socket, what you'll actually see is carriage return linefeed pairs. http is fussy about that! It would make a lot of sense to keep reading past them until we've got the whole file or a full buffer. But the assignment seems to say it wants us to stop...

Code: [Select]
; Nasm -f elf32 l_gets.asm -d TESTMAIN ; ld -o l_gets l_gets.o bits 32 %ifdef TESTMAIN section .data section .bss buf resb 10 section .text global _start _start: ; test a call to it push 10 ; length push buf push 0 ; stdin call l_gets add esp, 4 * 3 ;print what we l_getsed mov eax, 4 ;sys_write mov ebx, 1 ; stdout mov ecx, buf mov edx, 10 int 80h exit: mov eax, 1 ; sys exit mov ebx, edx ; return value is number of bytes written (len) int 0x80 %endif ;----------------------------- section .text global l_gets l_gets: push ebp ; prologue, set up stack frame mov ebp, esp ; xor eax, eax ; zero eax to prepare for syscall # push ebx ; preserve ebx mov ebx, [ebp + 8] ; fd parameter goes into ebx mov ecx, [ebp + 12] ; char *buf stored into ecx mov edx, [ebp + 16] ; len stored into edx cmp edx, 0 ; if len is zero or less, exit program jle .done ; read data onto stack: .buf_loop: ; push esp ; cmp esp, byte 0x0A ; if a newline character is encountered, exit ; je .done ; sub esp, 4 ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack mov eax, 3 ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len) int 0x80 ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop ; inc eax ; advance the counter ; cmp edx, eax ; je .done ; jmp .buf_loop ; continue .done: pop ebx ; restore caller's reg ; epilogue mov esp, ebp pop ebp retTested (a little) but incomplete...

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 06, 2017, 06:12:05 PM: Should I not leave in the last part of the .buf_loop:

Code: [Select]
inc eax ; advance the counter cmp edx, eax je .done jmp .buf_loop ; continue
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 06, 2017, 07:00:03 PM: I don't see why. After your sys_read, eax is the number of bytes actually read. edx is the maximum number to read. You're counting the number of bytes not read? There's something you're trying to do with this ".buf_loop" that I'm not getting. Quite possibly my fault...

Put it back in and see what it does, if you like...

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 06, 2017, 08:06:21 PM: - How do you declare a variable that takes in a dynamic sized amount (for buf).. because right now buf is only big enough for 10 characters
- The program requirement is for it to stop running when it reaches a new line feed, I think this is for parsing in subsequent networking assignment. Why is the newline character at [ecx + eax -1]?
- Why are we only preserving and restoring ebx and not ecx or edx?
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 06, 2017, 08:38:28 PM: You can get more memory with sys_brk... but you shouldn't have to worry about that. It's the caller's responsibility to provide the buffer and tell you how big it is. To get more buffer in "test main" just make it bigger.

The linefeed's at [ecx + eax -1] 'cause that's where it is. sys_read returns the number of bytes read in eax, and it stops when (doesn't stop until) it gets the linefeed. ecx, of course, is the beginning of the buffer.

As we've discussed, this is (probably) going to be different with a real file... or a socket. If the assignment really requires it, you could read one byte at a time (put 1 in edx and ignore what the caller tells you). Makes more sense to simply find it in the buffer...

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 06, 2017, 09:09:17 PM: Will this code do what I need it to (it works using test main and preset buffer but if it is trying to take in a variable amount of text, will it do that and return the number of bytes read as exit code: even if a newline character is located before the amount of "len" ?

Code: [Select]
global l_gets l_gets: push ebp ; prologue, set up stack frame mov ebp, esp push ebx ; preserve ebx mov ebx, [ebp + 8] ; fd parameter goes into ebx mov ecx, [ebp + 12] ; char *buf stored into ecx mov edx, [ebp + 16] ; len stored into edx cmp edx, 0 ; if len is zero or less, exit program jle .done cmp ecx, [ecx + eax -1] ; compare curremt character to new line, exit if true je .done ; read data onto stack: mov eax, 3 ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len) int 0x80 .done: pop ebx ; restore the caller's ebx mov esp, ebp ; epilogue- clean up the stack pop ebp ret
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 06, 2017, 09:25:35 PM: Something like this?
Code: [Select]
bits 32 %ifdef TESTMAIN section .data section .bss buf resb 80 section .data filename db `l_gets.asm\0` ; ourself - we know it's there section .text global _start _start: mov eax, 5 ; sys_open mov ebx, filename xor ecx, ecx xor edx, edx int 80h test eax, eax js exit ; bail out if error ; test a call to it push 80 ; length push buf push eax ; fd call l_gets add esp, 4 * 3 ; find NL - we want to stop there xor edx, edx find: cmp [ecx + edx], byte 10 je found inc edx jmp find found: ; edx is length to print ;print what we l_getsed ; mov edx, eax ; length read mov eax, 4 ;sys_write mov ebx, 1 ; stdout mov ecx, buf int 80h exit: mov eax, 1 ; sys exit mov ebx, edx ; return value is number of bytes written (len) int 0x80 %endif ;----------------------------- global l_gets l_gets: push ebp ; prologue, set up stack frame mov ebp, esp ; xor eax, eax ; zero eax to prepare for syscall # push ebx ; preserve ebx mov ebx, [ebp + 8] ; fd parameter goes into ebx mov ecx, [ebp + 12] ; char *buf stored into ecx mov edx, [ebp + 16] ; len stored into edx cmp edx, 0 ; if len is zero or less, exit program jle .done ; read data onto stack: .buf_loop: ; push esp ; cmp esp, byte 0x0A ; if a newline character is encountered, exit ; je .done ; sub esp, 4 ; will use esp for the pointer to the buffer, the bytes to be read will be pushed onto stack mov eax, 3 ; sys call for read, to begin reading bytes- for sys read, ebx= int (fd), ecx= char, edx= size_t (len) int 0x80 ; read each character one at a time, increment counter (in eax), when counter matches len, jump out of loop ; inc eax ; advance the counter ; cmp edx, eax ; je .done ; jmp .buf_loop ; continue .done: pop ebx ; restore caller's reg mov esp, ebp pop ebp ret
Could put the "find linefeed" part in the function. That may be what the assignment expects?

Best,
Frank

Ahhh, just read your latest. I think your code will work reading from stdin. (ahhh, no it won't - ecx is not likely 0xA - especially before the sys_read!) This tests from a real file. As I expected, it reads past the linefeed. Comment out the "find LF" part to see what it does without it...
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 07, 2017, 10:06:01 PM: Frank,

At this point my brain is feeling clobbered and I want to break this simple function down step by step.

First off, I would just like to write the part of the code that pushes each byte onto the stack to be read.

Would I implement this as a something like:

Code: [Select]
sys read push buf pop ecx add [buf], 4 inc counter if counter == len OR if ecx == 0xOA jmp done
Is this the correct way? I am having a tough time understanding how the stack can be used to store one byte at a time as a kind of "buffer," but my professor says that it can be done. He isn't too keen on giving us the exact code on how to do it, however.

We're gonna have to pop each byte as it is pushed onto the stack, correct?
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 07, 2017, 10:49:59 PM: Here is a stack diagram I drew to try to visualize how this code is working:

Code: [Select]
bits 32 section .text l_gets: push ebp ; prologue mov ebp, esp push ebx mov ebx, [ebp + 8] ; set up registers on stack for args mov ecx, [ebp + 12] mov edx, [ebp + 16] cmp edx, 0 jle .done .read_loop: mov eax, 3 ; sys call read int 0x80 push buf ; push buf onto a stack because this is where each character is going to be stored pop ecx ; pops the current character in ecx into buf ; this part I am getting hung up on, how to increment to the next character to be read in from whatever file is coming in cmp edx, char_count ; jump if number of bytes read has reached “len” je .done .done: pop ebx mov esp, ebp ; epilogue pop ebp ret
(http://i.imgur.com/tbed9hM.jpg)
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 04:29:33 AM: i was able to write this functioning l_gets:

Code: [Select]
bits 32 section .text global l_gets l_gets: push ebp ; prologue mov ebp, esp push ebx ; preserve ebx mov ebx, [ebp + 8] ; fd parameter into ebx mov ecx, [ebp + 12] ; char *buf in ecx mov edx, [ebp + 16] ; len in edx cmp edx, 0 ; if len zero or less, exit jle .done mov eax, 3 ; sys read int 0x80 .done: pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret
When using your testmain though, you move 10 to edx (for 10 characters). Even though I type less than 10 characters into stdin, i always get a return value of 10.. the program should return only the number of bytes read. This and figuring out how to stop input after the newline character is read are all that's left and i'm pretty much done with this one. Onto the next..
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 08, 2017, 04:37:25 AM: Ahhh... your instructor is apparently not explaining this well. I probably won't explain it well, either. Such is life...

You can indeed put one byte on the stack, but forget about "push" and "pop" to do so. First, let us create a "local" variable on the stack to use as a "buffer":
Code: [Select]
func: push ebp mov ebp, esp sub esp, 64 ...
The amount we subtract from esp should be a multiple of 4, just to keep the stack well aligned. It could be more (or less) but probably shouldn't be more than 4096 (a "page"). So we've got a local variable that we can use as a buffer (or anything else). Its address is ebp - 64. That's the address of the first (lowest) byte, as usual.

I guess what you're trying to do is to read one byte at a time into this buffer - instead of the buffer the caller has provided? If so, we ignore the buffer the caller has provided, and the length. We still need the file descriptor...
Code: [Select]
func: push ebp mov ebp, esp sub esp, 64 mov ebx, [ebp + 8] lea ecx, [ebp - 64] xor esi, esi ; to use as counter mov edx, 1 ; read just one byte .top: mov eax, 3 ; sys_read int 0x80 cmp [ecx], byte 0xA ; linefeed? je .done inc ecx ; next byte in buffer inc esi ; count it cmp esi, 64 ; we're out of local buffer je .done cmp esi, [ebp + 16] ; caller's idea of length je .done jmp .top ; go get another one .done: ; do something intelligent... ; clean up and go homeWe may need more than one ".done:" label here... I don't think I'm too fond of this approach.

I think what I would rather do is a perfectly ordinary sys_read into the buffer the caller provides for the length the caller provides. Then... if the file descriptor is stdin, make sure that we do have that linefeed. If not, read into a dummy buffer until we find it to flush the keyboard buffer. This is complicated by the fact that *nix is fairly tolerant about file descriptors. If we try to read from stdout or even stderr, we still read from the keyboard. I'm not sure what happens with stdaux. If the file descriptor is 4 or greater, we presumably have a disk file or socket. This will read past the linefeed. In the case of a disk file, we could seek back to where the linefeed was found. I don't think this will work on a socket. We may have no choice but to read one byte at a time. I think I'd still read into the buffer the caller provides...

Quote
We're gonna have to pop each byte as it is pushed onto the stack, correct?

I don't think so. Why push it at all if you're only going to pop it again? I'm still not sure I understand what you're trying to do here.

Your diagram looks pretty good except that you've left out the return address (below the parameters and above old ebp). Also, esp points at what you pushed last, not below it yet...

Code: [Select]
push bufpushes the address of buf - probably not useful. You could push four bytes and ignore all but one. To increment an address, you want to put it in a register and increment that.
Code: [Select]
; if ecx is caller's buffer... lea edi, [ebp - 64] ; our buffer on stack .top: mov al, [ecx] mov [edi], al inc edi ; or use stosb inc ecx jmp .topThat will need an exit condition, of course. I've ignored preserving registers for now. Beside ebx (used by system calls) we'll probably need both esi and edi. That can be added - just trying to keep it simple. Doubt if I've succeeded...

Best,
Frank

next post: probably my fault you're getting the bad return value. Check closely what I've done...
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 05:31:48 AM: Thanks, some things make sense now, others still don't.

Here's what I have so far. I'm not spending any time on this one tonight.

Code: [Select]
global l_gets l_gets: push ebp ; prologue mov ebp, esp push ebx ; preserve ebx mov ebx, [ebp + 8] ; fd parameter into ebx mov ecx, [ebp + 12] ; char *buf in ecx mov edx, [ebp + 16] ; len in edx xor esi, esi ; counter cmp edx, 0 ; if len zero or less, exit jle .done .char_loop: mov eax, 3 ; sys read int 0x80 cmp [ecx], byte 0xA ; test for linefeed je .done inc ecx ; advance to next byte inc esi ; +count cmp esi, [ebp + 16] ; does read bytes = len? je .done jmp .char_loop .done: mov eax, esi ; # bytes read into eax pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 05:40:14 PM: These additional functions I need to complete:

void l_puts(const char *buf);
write the contents of the null terminated string buf to stdout. The null byte must not be written. If the length of the string is zero, then no bytes are to be written.

int l_write(int fd, char *buf, int len);
write len bytes from buffer buf to file fd. Return the number of bytes actually written or -1 if an error occurs.

int l_open(const char *name, int flags, int mode);
opens the named file with the supplied flags and mode. Returns the integer file descriptor of the newly opened file or -1 if the file can't be opened.

int l_close(int fd);
close the indicated file, returns 0 on success or -1 on failure.

int l_exit(int rc);
terminate the calling program with exit code rc.

Can you maybe help to "guide" as far as how I should set these up so I can work on these this afternoon? Thank you again
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 08, 2017, 06:17:52 PM: Well, "puts" is just a sys_write to stdout. Yhe only "different" thing about it is the null-terminated string. You can call your I_strlen and move the length to edx or do:
Code: [Select]
cmp [ecx + edx], byte 0 ; etc.
The rest of 'em are just wrappers around system calls. They'll look a lot like your l_gets. If error, eax will be -ERRNO - you need to change it to -1. The real C library does this and puts the error number in the global variable "errno". You apparently don't need to do that. Probably were supposed to do that for l_gets, too.

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 09:45:49 PM: Does this look good for l_puts:

Code: [Select]
bits 32 section .text global l_puts l_puts: push ebp ; prologue mov ebp, esp push ebx ; preserve ebx mov ebx, [ebp + 8] ; const char *buf goes into ebx .char_loop: cmp [ebx], byte 0x0 ; look for null terminator je .done mov eax, 4 ; sys write int 0x80 jmp .char_loop .done: pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret
I'm thinking I would need to write each byte, kind of like l_gets reads in each byte. The sys_write takes three args: int, const char*, size_t. Would I need to set up the stack in the .char_loop so that ebx, ecx, and edx hold each of these arguments? I'm a little confused here because the actual call to l_puts only takes in one argument (or are these called parameters?), so not sure how to set this up.
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 10:22:26 PM: Here is what I have so far for l_write:

Code: [Select]
bits 32 section .text global l_write l_write: push ebp ;prologue mov ebp, esp push ebx mov ebx, [ebp + 8] ; fd stored in ebx mov ecx, [ebp + 12] ; char *buf stored in ecx mov edx, [ebp + 16] ; len stored in edx xor esi, esi ; counter cmp edx, 0 ; check for error jle .error .char_loop: mov eax, 4 ; sys write int 0x80 inc ecx inc esi cmp esi, [ebp + 16] ; does bytes written = len? je .done jmp .char_loop .error: mov eax, -1 ; error pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret .done: mov eax, esi ; return # bytes written pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 08, 2017, 10:26:54 PM: Well, no. The one and only parameter (you can call it an argument), the address of the string, goes in ecx. We know that the file descriptor wants to be stdout (1) - that goes in ebx. The length, which you need to find either by calling l_strlen or by finding the zero here, goes in edx. You might want to use a loop like this:
Code: [Select]
; address is in ecx xor edx, edx .find: cmp [ecx + edx], byte 0 jnz .found inc edx jmp .findOr call l_strlen...

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 08, 2017, 10:41:33 PM: Code: [Select]
bits 32 section .text global l_write l_write: push ebp ;prologue mov ebp, esp push ebx mov ebx, [ebp + 8] ; fd stored in ebx mov ecx, [ebp + 12] ; char *buf stored in ecx mov edx, [ebp + 16] ; len stored in edx xor esi, esi ; counter ; you don't need a counter, eax does it ; if you do use esi, you need to push/pop it with ebx cmp edx, 0 ; check for error ; and eax will be negative if error, not edx ; and you need to do this after sys_write, not before jle .error .char_loop: mov eax, 4 ; sys write int 0x80 inc ecx inc esi cmp esi, [ebp + 16] ; does bytes written = len? je .done ; eax will be edx... even if some of 'em are garbage ; unless there's an error jmp .char_loop .error: mov eax, -1 ; error pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret .done: mov eax, esi ; return # bytes written pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret
Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 08, 2017, 11:03:25 PM: Code: [Select]
; address is in ecx xor edx, edx .find: cmp [ecx + edx], byte 0 jnz .found inc edx jmp .find
Did you mean that to say "je .found" instead of "jnz .found" ?
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 08, 2017, 11:19:42 PM: Sure enough. My bad.

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 09, 2017, 06:53:20 PM: OK so I got a lot going through my head now, back to the l_gets,

can you take a look at the code and let me know where I should go from here?

Code: [Select]
l_gets: push ebp ; prologue mov ebp, esp push ebx ; preserve ebx mov ebx, [ebp + 8] ; fd parameter into ebx mov ecx, [ebp + 12] ; char *buf in ecx mov edx, [ebp + 16] ; len in edx xor esi, esi ; counter cmp edx, 0 ; if len zero or less, exit jle .done .char_loop: mov eax, 3 ; sys read int 0x80 cmp [ecx], byte 0xA ; test for linefeed je .done inc ecx ; advance to next byte inc esi ; +count cmp esi, [ebp + 16] ; does read bytes = len? je .done jmp .char_loop .done: mov eax, esi ; # bytes read into eax pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 09, 2017, 07:57:08 PM: Frank can you provide a little insight as to why

Code: [Select]
cmp [ecx + edx], byte 0
is used to find a null terminator for "l_puts"

ecx= the pointer to the address of the string, correct? void l_puts(const char *buf)

if edx starts at 0, is the first iteration checking the first character in the string for 0?
Then if not zero, edx increments to 1, does that mean that the second byte in the string (or "character array") is then checked for zero? It's not making sense to me how this is working and seems like we have to take a lot on faith in this programming stuff.
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 09, 2017, 08:13:09 PM: code for l_puts:

Code: [Select]
bits 32 section .text global l_puts l_puts: push ebp ; prologue mov ebp, esp push ebx ; preserve ebx push edi ; preserve edi push esi ; preserve esi mov ebx, 1 ; 1= stdout mov ecx, [ebp + 12] ; const char *buf [address of string] goes into ecx xor edx, edx .char_loop: cmp [ecx + edx], byte 0 ; look for null terminator je .done mov eax, 4 ; sys write int 0x80 inc edx jmp .char_loop .done: pop esi ; restore esi pop edi ; restore edi pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret
notice I am now preserving and restoring edi and esi registers in addition to ebx, per instructions by my professor
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 09, 2017, 09:22:42 PM: What I got so for for l_write:

Code: [Select]
bits 32 section .text global l_write l_write: push ebp ;prologue mov ebp, esp push ebx ; preserve regisers push edi push esi mov ebx, [ebp + 8] ; fd stored in ebx mov ecx, [ebp + 12] ; char *buf stored in ecx mov edx, [ebp + 16] ; len stored in edx xor esi, esi ; counter cmp edx, 0 ; check for 0 len jle .done .char_loop: mov eax, 4 ; sys write int 0x80 cmp eax, 0 ; check for error jle .error inc ecx ; move to next char inc esi ; increment counter cmp esi, [ebp + 16] ; does bytes written = len? je .done jmp .char_loop .error: mov eax, -1 ; error pop esi ; restore registers pop edi pop ebx mov esp, ebp ; epilogue pop ebp ret .done: mov eax, esi ; return # bytes written pop esi pop edi pop ebx ; restore registers mov esp, ebp ; epilogue pop ebp ret
l_write instructions:

int l_write(int fd, char *buf, int len)
write len bytes from buffer buf to file fd. Return the number of bytes actually written or -1 if an error occurs.
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 09, 2017, 09:35:43 PM: If you're using esi, you ought to preserve it. Push it right after ebx and pop it right before, or the other way around. That's easy.

Now... s'pose we're reading from stdin, and the user types nothing but the "enter" key. After the sys_read eax will be 1 and that's what we want to return - but esi is still zero, no? You may want to start off with esi = 1. Suppose the length, as provided by the caller, is 4, and the user types 3 characters and "enter". eax will be 4 and I guess that's what esi will be when we put it back into eax. I may have to try that one. If the user types 4 or more characters before "enter", eax will be 4 and that's what we'll return. The linefeed and perhaps some characters will remain in the "keyboard buffer" to mess us up later unless we flush them. The assignment doesn't say anything about that, so I guess we can ignore it. We may regret that.

Suppose we're reading from a disk file, or socket. We'll read edx bytes, regardless of linefeeds. Your code counts up to the linefeed (if any) and returns that, "as if" we had stopped at the linefeed. But we didn't. Another read from that file will start where we left off, edx bytes into the file, not at the linefeed. This may not be satisfactory. The assignment says to stop at the linefeed. The only way I can think of to do that is to read one byte at a time, ugly as that is. I really don't know what to advise you on this. Best to stick to the assignment, I'm afraid...

If I get to it, I'll download your code and try it. As we have discovered, untested code can have misteaks. :)

Best,
Frank

Aw, jeez, three new messages? I'll get back to ya...
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 09, 2017, 11:06:57 PM: Quote from: turtle13 on September 09, 2017, 07:57:08 PM
Frank can you provide a little insight as to why

Code: [Select]
cmp [ecx + edx], byte 0
is used to find a null terminator for "l_puts"

ecx= the pointer to the address of the string, correct? void l_puts(const char *buf)

if edx starts at 0, is the first iteration checking the first character in the string for 0?
Then if not zero, edx increments to 1, does that mean that the second byte in the string (or "character array") is then checked for zero? It's not making sense to me how this is working and seems like we have to take a lot on faith in this programming stuff.

No faith, just logic.Everything you say is correct and as it should be. If the zero is the first character, the length is zero - we don't want to count the zero as part of the length. If the zero is the second character, the length is 1, etc.

However...
Code: [Select]
bits 32 section .text global l_puts l_puts: push ebp ; prologue mov ebp, esp push ebx ; preserve ebx push edi ; preserve edi push esi ; preserve esi mov ebx, 1 ; 1= stdout mov ecx, [ebp + 12] ; const char *buf [address of string] goes into ecx Yes, but it's at [ebp + 8], being the first and only parameter! xor edx, edx .char_loop: cmp [ecx + edx], byte 0 ; look for null terminator ; je .done No, only "found length", not "done"! je .found_lenght inc edx jmp .char_loop .found_length: ; now we've got ebx, ecx, and edx where we want 'em mov eax, 4 ; sys write int 0x80 test eax, eax ; just to set flags jns .done ; no error mov eax, -1 ; or eax, -1 ; shorter way to do the same thing ; inc edx ; jmp .char_loop .done: pop esi ; restore esi pop edi ; restore edi pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret
Does no harm to preserve registers we don't use.

"l_write" is simpler than you've got it. With the exception of making the error -1, depriving the caller of "what" went wrong, it's just sys_write.
Code: [Select]
bits 32 section .text global l_write l_write: push ebp ;prologue mov ebp, esp push ebx ; preserve regisers push edi push esi mov ebx, [ebp + 8] ; fd stored in ebx mov ecx, [ebp + 12] ; char *buf stored in ecx mov edx, [ebp + 16] ; len stored in edx ; xor esi, esi ; counter ; we don't need a counter cmp edx, 0 ; check for 0 len jle .done ; does no harm to check caller for idiocy ; we don't write anything anyway ; .char_loop: mov eax, 4 ; sys write int 0x80 cmp eax, 0 ; check for error jle .error ; probably should be just "jl" ; strictly speaking , 0 is not an error ; inc ecx ; move to next char ; inc esi ; increment counter ; cmp esi, [ebp + 16] ; does bytes written = len? ; je .done ; jmp .char_loop ; we don't need any of that .error: mov eax, -1 ; error pop esi ; restore registers pop edi pop ebx mov esp, ebp ; epilogue pop ebp ret .done: mov eax, esi ; return # bytes written pop esi pop edi pop ebx ; restore registers mov esp, ebp ; epilogue pop ebp ret
No real need to duplicate the entire "clean up and go home". We just need to make eax -1 if it was negative (depriving the caller of useful information) and leave it alone if no error. Does no harm.

It probably would have been a good idea to make each of these functions a separate "topic". A little late now.

Now... see if I still feel like looking at l_gets...

Best,
Frank
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 02:46:43 AM: ok, nearly final l_write:

Code: [Select]
bits 32 section .text global l_write l_write: push ebp ;prologue mov ebp, esp push ebx ; preserve regisers push edi push esi mov ebx, [ebp + 8] ; fd stored in ebx mov ecx, [ebp + 12] ; char *buf stored in ecx mov edx, [ebp + 16] ; len stored in edx cmp edx, 0 ; check for 0 len jle .done mov eax, 4 ; sys write int 0x80 cmp eax, 0 ; check for error (when eax is less than zero) jl .error .done: pop esi pop edi pop ebx ; restore registers mov esp, ebp ; epilogue pop ebp ret .error: mov eax, -1 ; error jmp .done
I'm not following how this would return the number of bytes written (which is why I was using esi as a counter, every byte written would increment 1, and then that is moved into eax before returning) unless that part needs to be amended..
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 10, 2017, 02:54:38 AM: This is what I've got for l_gets. It's pretty much your code. I moved "inc esi" up to the top so our first try is 1, not 0. If esi==len, we do want to do that read, in case the LF is there. Now we do. I cut back to reading one byte at a time, little as I like it. Fugly! I did not attempt to "flush the buffer". I indicated where we might want to - only if we're reading from stdin!

Code: [Select]
; nasm -f elf32 l_gets.asm -d TESTMAIN ; ld -o l_gets l_gets.o bits 32 %ifdef TESTMAIN section .bss buf resb 80 fd resd 1 section .data filename db `l_gets.asm\0` ; ourself - we know it's there section .text global _start _start: mov eax, 5 ; sys_open mov ebx, filename xor ecx, ecx xor edx, edx int 80h test eax, eax js exit ; bail out if error mov [fd], eax ; test a call to it ; try multiple calls if we're reading file ; just to make sure we're stopping at LF ; and can continue from there mov esi, 7 top: push 80 ; length push buf push dword [fd] call l_gets add esp, 4 * 3 ; print what we l_getsed - l_got? mov edx, eax ; length read mov eax, 4 ;sys_write mov ebx, 1 ; stdout mov ecx, buf int 80h ; only if we're doing multiple reads dec esi jnz top exit: mov ebx, eax ; return value is number of bytes written (len) mov eax, 1 ; sys exit int 0x80 %endif ;----------------------------- section .text global l_gets l_gets: push ebp ; prologue mov ebp, esp push ebx ; preserve ebx push esi mov ebx, [ebp + 8] ; fd parameter into ebx mov ecx, [ebp + 12] ; char *buf in ecx mov edx, [ebp + 16] ; len in edx xor esi, esi ; counter cmp edx, 0 ; if len zero or less, exit jle .done mov edx, 1 ; read one byte at a time. ugh! .char_loop: inc esi ; increment count first mov eax, 3 ; sys read int 0x80 cmp [ecx], byte 0xA ; test for linefeed je .done inc ecx ; advance to next byte cmp esi, [ebp + 16] ; does read bytes = len? je .done ; if this happens, we didn't find a LF ; if we're reading stdin, this indicates overflow ; we might want to flush OS's input buffer ("keyboard buffer") jmp .char_loop .done: mov eax, esi ; # bytes read into eax pop esi pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret ;-----------------------------------------------
That's as far as I got. I'm not sure it's "complete". See what you think...

Best,
Frank

Ah, again we're bumping into each other. Your l_write looks good at first glance. It returns the number of bytes written because that's what sys_write does!
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 02:56:06 AM: completed l_puts:

Code: [Select]
bits 32 section .text global l_puts l_puts: push ebp ; prologue mov ebp, esp push ebx ; preserve ebx push edi ; preserve edi push esi ; preserve esi mov ebx, 1 ; 1= stdout mov ecx, [ebp + 8] ; const char *buf [address of string] goes into ecx (first and only parameter) xor edx, edx ; set up a counter .char_loop: cmp [ecx + edx], byte 0 ; look for null terminator je .found_len inc edx ; counter + 1 jmp .char_loop inc edx jmp .char_loop .found_len: mov eax, 4 ; sys write int 0x80 test eax, eax ; set flags jns .done ; jump if zero or positive (no error) mov eax, -1 ; set error code if error .done: pop esi ; restore esi pop edi ; restore edi pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 03:04:31 AM: completed l_open:

Code: [Select]
bits 32 section .text global l_open l_open: push ebp ; prologue mov ebp, esp push ebx ; preserve registers push edi push esi mov ebx, [ebp + 8] ; name of the file (const char * name) mov ecx, [ebp + 12] ; flags mov edx, [ebp + 16] ; mode mov eax, 5 ; open sys call int 0x80 cmp eax, 0 ; check for error jl .error .done: pop esi ; restore registers pop edi pop ebx mov esp, ebp ; epilogue pop ebp ret .error: mov eax, -1 jmp .done
int l_open(const char *name, int flags, int mode)
opens the named file with the supplied flags and mode. Returns the integer file descriptor of the newly opened file or -1 if the file can't be opened.
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 03:19:27 AM: completed l_close:

Code: [Select]
bits 32 section .text global l_close l_close: push ebp ; prologue mov ebp, esp push ebx ; preserve registers push edi push esi mov ebx, [ebp + 8] ; file descriptor mov eax, 6 ; close sys call int 0x80 cmp eax, 0 jl .error ; check for error je .success ; check for success .done: pop esi ; restore registers pop edi pop ebx mov esp, ebp ; epilogue pop ebp ret .error: mov eax, -1 jmp .done .success: mov eax, 0 jmp .done
int l_close(int fd)
close the indicated file, returns 0 on success or -1 on failure.
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 03:57:30 AM: And finally, the last function, "l_exit"

Code: [Select]
bits 32 section .text l_exit: ; are these needed for exit: ; push ebp ; prologue ; mov ebp, esp ; push ebx ; preserve registers ; push edi ; push esi mov ebx, [ebp + 8] ; int rc mov eax, 1 ; exit sys call int 0x80
int l_exit(int rc)
terminate the calling program with exit code rc.
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 04:03:38 AM: Should I be performing xor on all of these registers at the beginning of all of these functions?
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 10, 2017, 04:21:18 AM: Your l_gets seems to work great! However, it looks like it is repeating the same line 7 times, care to extrapolate on that one?
The return value 37 is correct, 36 characters + linefeed :D

(http://i.imgur.com/OMZFsOh.png)

*edit aha, never mind, I see in the testmain code that you are repeating 7 times (esi= 7)
Ignore this post!

Quote from: Frank Kotler on September 10, 2017, 02:54:38 AM
This is what I've got for l_gets. It's pretty much your code. I moved "inc esi" up to the top so our first try is 1, not 0. If esi==len, we do want to do that read, in case the LF is there. Now we do. I cut back to reading one byte at a time, little as I like it. Fugly! I did not attempt to "flush the buffer". I indicated where we might want to - only if we're reading from stdin!

Code: [Select]
; nasm -f elf32 l_gets.asm -d TESTMAIN ; ld -o l_gets l_gets.o bits 32 %ifdef TESTMAIN section .bss buf resb 80 fd resd 1 section .data filename db `l_gets.asm\0` ; ourself - we know it's there section .text global _start _start: mov eax, 5 ; sys_open mov ebx, filename xor ecx, ecx xor edx, edx int 80h test eax, eax js exit ; bail out if error mov [fd], eax ; test a call to it ; try multiple calls if we're reading file ; just to make sure we're stopping at LF ; and can continue from there mov esi, 7 top: push 80 ; length push buf push dword [fd] call l_gets add esp, 4 * 3 ; print what we l_getsed - l_got? mov edx, eax ; length read mov eax, 4 ;sys_write mov ebx, 1 ; stdout mov ecx, buf int 80h ; only if we're doing multiple reads dec esi jnz top exit: mov ebx, eax ; return value is number of bytes written (len) mov eax, 1 ; sys exit int 0x80 %endif ;----------------------------- section .text global l_gets l_gets: push ebp ; prologue mov ebp, esp push ebx ; preserve ebx push esi mov ebx, [ebp + 8] ; fd parameter into ebx mov ecx, [ebp + 12] ; char *buf in ecx mov edx, [ebp + 16] ; len in edx xor esi, esi ; counter cmp edx, 0 ; if len zero or less, exit jle .done mov edx, 1 ; read one byte at a time. ugh! .char_loop: inc esi ; increment count first mov eax, 3 ; sys read int 0x80 cmp [ecx], byte 0xA ; test for linefeed je .done inc ecx ; advance to next byte cmp esi, [ebp + 16] ; does read bytes = len? je .done ; if this happens, we didn't find a LF ; if we're reading stdin, this indicates overflow ; we might want to flush OS's input buffer ("keyboard buffer") jmp .char_loop .done: mov eax, esi ; # bytes read into eax pop esi pop ebx ; restore ebx mov esp, ebp ; epilogue pop ebp ret ;-----------------------------------------------
That's as far as I got. I'm not sure it's "complete". See what you think...

Best,
Frank

Ah, again we're bumping into each other. Your l_write looks good at first glance. It returns the number of bytes written because that's what sys_write does!
Title: Re: Help with writing custom C type string functions using NASM
Post by: turtle13 on September 15, 2017, 12:50:01 AM: In regards to l_open not working here are feedback comments:

Your l_open doesn't work correctly. After sys_open, you don't check the return value. Your l_open should return -1 if you fail to open the file for any reason.

My l_open code:

Code: [Select]
l_open: push ebp ; prologue mov ebp, esp push ebx ; preserve registers push edi push esi mov ebx, [ebp + 8] ; name of the file (const char * name) mov ecx, [ebp + 12] ; flags mov edx, [ebp + 16] ; mode mov eax, 5 ; open sys call int 0x80 cmp eax, 0 ; check for error jl .error .done: pop esi ; restore registers pop edi pop ebx mov esp, ebp ; epilogue pop ebp ret .error: mov eax, -1 jmp .done
So.. how to fix so that it returns -1 for failure to open?
Title: Re: Help with writing custom C type string functions using NASM
Post by: Frank Kotler on September 15, 2017, 01:59:45 AM: That looks correct to me!

SMF 2.0.19 | SMF © 2021, Simple Machines