Author Topic: Bash my code. It works but is it done right? (Read 23949 times)

JoeCoder · « **on:** June 21, 2011, 06:55:09 PM »

Hi guys. Thanks to the guys here (Keith, Rob, Frank, Loco) to name a few I have written my first x86 code. I actually wrote something before but I started out using an example from a book. This is all mine and I have a few questions on it (please see comments in the code). If you can answer my questions or if you have any other comments or suggestions that will be great, because I don't have any confidence that I did things the best way or the right way. It seems to be working correctly but that doesn't mean it's good. Sorry for the unaligned contents. It looks perfect in emacs but apparently the tabs don't work outside of emacs. Also any nops are just there for the debuggers I'm using.

Code: [Select]

;    invoke as  ./test < inputfilename
;    or just ./test
;    if you use it with console stdin then type control d to end input
;
section .data
writmsg  db    "Char was: ",0,10
writmlen equ   $-writmsg	 

readcnt  dd    0		          ; diagnostic, will remove
	

section  .bss
buffalen equ   8192		          ; 8k

readbuff resb  buffalen		  ; input buffer area


section  .text
         global _start
;------------------------------------------------------------------------;
;  test readchar routine.		                                   ;
;------------------------------------------------------------------------;
_start:  nop
	 call readchar 		  ; al will contain char if one is available
	 jc exit0000  		  ; carry flag will be set in readchar
				          ; if error condition or EOF

	 mov ecx,writmsg	                ; addr of text to write
         mov byte [writmsg+10],al    ; move char to output message
	 mov edx,writmlen	                ; number of bytes to write
	 mov eax,4		                ; syscall = write
	 mov ebx,1		                ; fd 1 = stdout
	 int 80h	                                ; issue write syscall

	 nop
	 jmp _start		                ; repeat until no more data


;--------------------------------------------------------------------------------------;
;  return one character at a time to the caller from stdin. char is      ;
;  returned in al. carry flag is set for error condition, cleared for       ;
;  normal completion a la FreeBSD.					            ;
;--------------------------------------------------------------------------------------;
readchar push ebx		               ; save caller registers
         push ecx
	 push edx

	;; i want to compare the contents of readcurr and readlast
	;; is this the right way?
	 mov eax,[readcurr]	       ; get contents of readcurr
	 cmp eax,[readlast]	               ; any data remaining in buffer?
	 ja .readfil		               ; no, issue read syscall

	 mov eax,[readcurr]	       ; load eax with addr of current byte in buffer
	 mov al,[eax]		               ; save byte at readcurr for caller
	 inc dword [readcurr]	       ; inc input buffer pointer
	 clc			                       ; indicate normal completion
	 jmp .readexi		               ; restore and return

	
.readfil mov eax,3		               ; syscall = read
	 mov ebx,0		               ; fd 0 = stdin
	 mov ecx,readbuff	               ; address of buffer
	 mov edx,buffalen	               ; number of bytes to read (max)
	 int 80h		                       ; issue syscall

	 cmp eax,0		               ; did read return an error or eof?
	 jna .readerr		               ; yes, set cf and bail out

	 inc dword [readcnt]	       ; diagnostic counter of successful reads
	
	 ;; save the address of readbuff in readcurr
	 ;; is this the right way to do this?
	 lea edx,[readbuff]
	 mov dword [readcurr],edx

	 ;; save the address of the last input byte in readlast
	 ;; is this the right way to do this?
	 lea edx,[readbuff]
	 add edx,eax		
	 dec edx		
	 mov dword [readlast],edx

	 mov eax,[readcurr]	       ; load eax with addr of current byte in buffer	
	 mov al,[eax]		               ; save byte at readcurr for caller

	 ;; is this the right way to advance readcurr to next char in buffer?
	 inc dword [readcurr]	       ; increment input buffer pointer
	
	 clc			                       ; clear carry flag to indicate normal completion
	 jmp .readexi		               ; go to exit
	
	
.readerr stc			               ; set carry flag to indicate error
	
.readexi pop edx		               ; restore input registers
	 pop ecx
	 pop ebx

	 ret			                       ; return to caller

	
;-----------------------------------------------------------------------------------------------;
;  data for readchar procedure. Can I make these names local to readchar ;
;  even though they are in .data? OO assembly? Would be way cool!          ;
;-----------------------------------------------------------------------------------------------;	
section .data
readcurr dd    01h		  ; address of next available byte in
	                                  ; input buffer (readbuff)
readlast dd    00h		  ; address of last available byte in
				          ; input buffer (readbuff)
	
section .text	
align	4
;------------------------------------------------------------------------;
;  exit to system                                                                ;
;------------------------------------------------------------------------;
exit0000 mov eax,1		  ; exit syscall
	 mov ebx,0		          ; return code zero
	 int 80h		                  ; exit

Frank Kotler · « **Reply #1 on:** June 22, 2011, 02:57:36 AM »

Yeah! That seems to work quite well. I made a couple of minor changes - mostly a matter of "style"... I returned "readcnt" as an exit code, so we can read it with "echo $?" instead of using a debugger to see it. Strictly speaking, only bl is relevant as an exit code, the upper bits of ebx are ignored. So long as the number of reads doesn't exceed 255, this works.

If we're actually reading from the keyboard, sys_read returns every time we hit "enter". I wasn't sure what happens when it's redirected... which is why I did the above. Turns out, as I expected, but wasn't sure, it keeps reading past the linefeed for the full size of the buffer (I temporarily reduced the size of the buffer to check out what happened with multiple reads).

I tried making your variables "local" by simply prepending ".". I wasn't sure this would work with the variables in .data... but it does. If, in the course of modifying the code, you introduced a "non-local" label between 'em, this would break. A more "robust" way would be to put "readfile" and its variables in a separate file ("modular programming" rather than "monofile programming" - I advocate "modular", but often do "monofile"). We'd want to pass the buffer and its size as parameters, I imagine. We could leave these "hard coded", too, but would have to declare 'em "global" in the calling file and "extern" in the called file. Better to pass 'em as parameters, I think. Dunno what to do with "readcnt" in this case, but it's only a "temporary diagnostic" anyway. I may try an example of this, just to do it...

This is not much different from what you posted...

Code: [Select]

; nasm -f elf32 myfile.asm
; ld -o myfile myfile.o

;    invoke as  ./test < inputfilename
;    or just ./test
;    if you use it with console stdin then type control d to end input
;
section .data
writmsg  db    "Char was: "

; we can give the "target" a name, instead of counting bytes...
the_char db 0,10
; this won't screw up our "length"

writmlen equ   $-writmsg	 

readcnt  dd    0		          ; diagnostic, will remove
	

section  .bss
buffalen equ   8192		          ; 8k

readbuff resb  buffalen		  ; input buffer area


section  .text
         global _start
;------------------------------------------------------------------------;
;  test readchar routine.		                                   ;
;------------------------------------------------------------------------;
_start:  nop
	 call readchar 		  ; al will contain char if one is available
	 jc exit0000  		  ; carry flag will be set in readchar
				          ; if error condition or EOF

	 mov ecx,writmsg	                ; addr of text to write
; instead of "+ 10", we can use a "named variable"
;         mov byte [writmsg+10],al    ; move char to output message
         mov byte [the_char],al    ; move char to output message

	 mov edx,writmlen	                ; number of bytes to write
	 mov eax,4		                ; syscall = write
	 mov ebx,1		                ; fd 1 = stdout
	 int 80h	                                ; issue write syscall

	 nop
	 jmp _start		                ; repeat until no more data


;--------------------------------------------------------------------------------------;
;  return one character at a time to the caller from stdin. char is      ;
;  returned in al. carry flag is set for error condition, cleared for       ;
;  normal completion a la FreeBSD.					            ;
;--------------------------------------------------------------------------------------;
readchar push ebx		               ; save caller registers
         push ecx
	 push edx

	;; i want to compare the contents of readcurr and readlast
	;; is this the right way?
	 mov eax,[.readcurr]	       ; get contents of readcurr
	 cmp eax,[.readlast]	               ; any data remaining in buffer?
	 ja .readfil		               ; no, issue read syscall

	 mov eax,[.readcurr]	       ; load eax with addr of current byte in buffer
	 mov al,[eax]		               ; save byte at readcurr for caller
	 inc dword [.readcurr]	       ; inc input buffer pointer
	 clc			                       ; indicate normal completion
	 jmp .readexi		               ; restore and return

	
.readfil mov eax,3		               ; syscall = read
	 mov ebx,0		               ; fd 0 = stdin
	 mov ecx,readbuff	               ; address of buffer
	 mov edx,buffalen	               ; number of bytes to read (max)
	 int 80h	                       ; issue syscall

	 cmp eax,0		               ; did read return an error or eof?
	 jna .readerr		               ; yes, set cf and bail out

	 inc dword [readcnt]	       ; diagnostic counter of successful reads
	
	 ;; save the address of readbuff in readcurr
	 ;; is this the right way to do this?
	 ; yeah. could also have used "mov edx, readbuff"
	 lea edx,[readbuff]
	 mov dword [.readcurr],edx

	 ;; save the address of the last input byte in readlast
	 ;; is this the right way to do this?
	 ; yeah... we already had "readbuff" in edx
	 ; so could have skipped doing it again... no harm
	 lea edx,[readbuff]
	 add edx,eax		
	 dec edx		
	 mov dword [.readlast],edx

	 mov eax,[.readcurr]	       ; load eax with addr of current byte in buffer	
	 mov al,[eax]		               ; save byte at readcurr for caller

	 ;; is this the right way to advance readcurr to next char in buffer?
	 ; yeah.
	 inc dword [.readcurr]	       ; increment input buffer pointer
	
	 clc		                       ; clear carry flag to indicate normal completion
	 jmp .readexi		               ; go to exit
	
	
.readerr stc			               ; set carry flag to indicate error
	
.readexi pop edx		               ; restore input registers
	 pop ecx
	 pop ebx

	 ret			                       ; return to caller

	
;-----------------------------------------------------------------------------------------------;
;  data for readchar procedure. Can I make these names local to readchar ;
;  even though they are in .data? OO assembly? Would be way cool!          ;
; yeah, kinda...
;-----------------------------------------------------------------------------------------------;	
section .data
.readcurr dd    01h		  ; address of next available byte in
	                                  ; input buffer (readbuff)
.readlast dd    00h		  ; address of last available byte in
				          ; input buffer (readbuff)
	
section .text	
align	4
;------------------------------------------------------------------------;
;  exit to system                                                                ;
;------------------------------------------------------------------------;
exit0000 mov eax,1		  ; exit syscall
;	 mov ebx,0		          ; return code zero

; further diagnostics...
	mov ebx, [readcnt]	; return count...
; we can display exit-code with "echo $?"

	 int 80h		                  ; exit

Best,
Frank

JoeCoder · « **Reply #2 on:** June 22, 2011, 07:33:51 AM »

Quote from: Frank Kotler on June 22, 2011, 02:57:36 AM

Yeah! That seems to work quite well. I made a couple of minor changes - mostly a matter of "style"... I returned "readcnt" as an exit code, so we can read it with "echo $?" instead of using a debugger to see it. Strictly speaking, only bl is relevant as an exit code, the upper bits of ebx are ignored. So long as the number of reads doesn't exceed 255, this works.

Hi Frank. Thanks alot for taking the time to go over my code and helping. I am looking at your changes side by side with my original now and I will update after I'm done looking at it.
I added readcnt just to satisfy my curiosity that it actually worked on big files, I will take it out when I'm done testing. Thanks for mentioning the thinking about the exit code being in bl. I didn't know that.

Quote from: Frank Kotler on June 22, 2011, 02:57:36 AM

If we're actually reading from the keyboard, sys_read returns every time we hit "enter". I wasn't sure what happens when it's redirected... which is why I did the above. Turns out, as I expected, but wasn't sure, it keeps reading past the linefeed for the full size of the buffer (I temporarily reduced the size of the buffer to check out what happened with multiple reads).

Yeah, that is actually what I want. I want to package this function as a library somehow (I'll ask about that soon) and the caller should get all chars. Specifically I want to process lines of text from an input file and be able to identify line ends on various platforms so I need this function to return all chars. I may wrapper it later to give logical lines etc. At this point it's intended to work on files instead of console input but it was easier at the beginning to use stdin. I'll also have to change it to accept an fd that I already opened and use files without redirecting.

Quote from: Frank Kotler on June 22, 2011, 02:57:36 AM

I tried making your variables "local" by simply prepending ".". I wasn't sure this would work with the variables in .data... but it does. If, in the course of modifying the code, you introduced a "non-local" label between 'em, this would break. A more "robust" way would be to put "readfile" and its variables in a separate file ("modular programming" rather than "monofile programming" - I advocate "modular", but often do "monofile").

That's amazing and would be very useful to encapsulate stuff in the monofile mode. I intend to make a bunch of callable routines and package them as a library I can link with. The code I have here is just a test case that includes the routine, I agree that's not how it should end up. Because I didn't know how to create a separate callable routine or library yet. BTW in my work we often intentionally create monolithic, huge source files because it makes reversing our stuff somewhat harder. But we have tools to do it so it remains easy to manage (not that everybody does it the right way but there are ways). And we have to deal with code segments that can't be larger than 4096K and you guys thought your 64k segments were unreasonable 20 years ago LOL

So we have a few tricks...

Quote from: Frank Kotler on June 22, 2011, 02:57:36 AM

We'd want to pass the buffer and its size as parameters, I imagine. We could leave these "hard coded", too, but would have to declare 'em "global" in the calling file and "extern" in the called file. Better to pass 'em as parameters, I think. Dunno what to do with "readcnt" in this case, but it's only a "temporary diagnostic" anyway. I may try an example of this, just to do it...

Yeah readcnt is going to get thrown away it was just a sanity check in my testing because I had a big input file but it was the same stuff over and over again and I wanted to verify at certain points in the debugger x number of reads got done based on a file of a given size divided by my 8k buffer choice. I don't know how I should manage the buffer and size but yeah it has to be unique by fd or it will be useless. This is a pretty minimal scenario just because it's all I could deal with for my first x86 code. I guess the easiest thing would be to have the caller deal with the buffer and size like you said, since without libc there is not any storage management in Linux. I still can't get over how they decided to implement that, I think it's pretty poor.

How would you pass parameters to a routine like this? Would you use a bunch of registers like a normal syscall and if so where would I save them in my routine? Or would you pass the address of a parameter list (that's how we would do it in my work). I guess maybe it would work to push all the arguments onto the stack and then just offset from the stack pointer while processing them? I will have to eventually write a little C to see how the compilers are dealing with this, but I am putting that off as long as I can because I can't stand C or ATT syntax assembly code!

Thanks alot, more questions later!

Joe

JoeCoder · « **Reply #3 on:** June 22, 2011, 08:03:17 AM »

I didn't want to have to code a hard offset into my msg but I didn't think of how to do it. I didn't like the idea of putting a named field after it because somebody can come along later and screw up the alignment. In MVS I would setup an equate to offset from the message so even if somebody screwed up the storage layout by inserting lines I would still point at the right position, but then again we have storage to storage moves of arbitrary length. I don't know how I would do it in x86 in general, although I guess in this case where it's only one char something like

Code: [Select]

writmsg db "This is a X message"
writxxx  equ writmsg+10
writmsgl equ $-writmsg
.
.
     mov byte [writxxx],al

Will that work?

Also in the code where you said

lea edx,[readbuff]

could have (should have?) been replaced with

mov edx,readbuff

I have to laugh. One thing that's really REALLY confusing to me is how you can move a name or [name] which is almost a high-level deref. And the meaning of the [] seems to change in context. I don't understand these rules yet and obviously they're basic to getting anything done in x86.

What I am used it is the name always refers to the field itself. Unless I really concentrate on it, I just don't get the idea of

mov edx,readbuff

because in my work that would be moving the 4 byte contents of readbuff (not the address of readbuff!) to edx. We learn over and over again not to do that. And now in x86 I have to unlearn that LOL. OMG my brain hurts!

After my syscall at .readfil you pointed out edx already contains the address of readbuff. I didn't realize that i could depend on the syscall preserving my input values. Is that generally a safe assumption in all syscalls and library calls, or do you need to check each one individually? Edit: oh maybe because *I* had just loaded it a few instructions before? So the question is are you saying the syscall didn't modify it, or just that I just loaded it (duh, face palm!

) and didn't need to do it again, just so I understand what you said.

Very cool that x86 has inc/dec support for memory operands. We have to load/store for that, we don't have any way to increment a memory operand.

Code: [Select]

	;; i want to compare the contents of readcurr and readlast
	;; is this the right way?
	 mov eax,[.readcurr]	       ; get contents of readcurr
	 cmp eax,[.readlast]	               ; any data remaining in buffer?
	 ja .readfil		               ; no, issue read syscall

And double checking on this, is this the best way to code this comparison? It seems burdensome because I would normally have the pointers already in registers. Maybe this is the best we can do in this situation, or is there a better way?

Thanks Frank!

Frank Kotler · « **Reply #4 on:** June 22, 2011, 12:32:08 PM »

You raise a number of good questions that deserve discussion - maybe in several more posts. Right now, I just got a "two part" version of you code working, so I'll just post that. It addresses one of your questions - how do we pass parameters? We can just put 'em in registers, but the "conventional" way is to pass parameters on the stack - which is how I've done it. I like "examples", so...

The calling code:

Code: [Select]

; nasm -f elf32 myfile.asm
; nasm -f elf32 readchar.asm
; ld -o myfile myfile.o readchar.o

;    invoke as  ./test < inputfilename
;    or just ./test
;    if you use it with console stdin then type control d to end input
;
extern readchar
global readcnt

section .data
writmsg  db    "Char was: "
the_char db 0,10
writmlen equ   $-writmsg	 

readcnt  dd    0		          ; diagnostic, will remove
	

section  .bss
buffalen equ   8192		          ; 8k

readbuff resb  buffalen		  ; input buffer area


section  .text
         global _start
;------------------------------------------------------------------------;
;  test readchar routine.		                                   ;
;------------------------------------------------------------------------;
_start:  nop

    push buffalen
    push readbuff
	 call readchar 		  ; al will contain char if one is available
	 jc exit0000  		  ; carry flag will be set in readchar
				          ; if error condition or EOF

	 mov ecx,writmsg	                ; addr of text to write
; instead of "+ 10", we can use a "named variable"
;         mov byte [writmsg+10],al    ; move char to output message
         mov byte [the_char],al    ; move char to output message

	 mov edx,writmlen	                ; number of bytes to write
	 mov eax,4		                ; syscall = write
	 mov ebx,1		                ; fd 1 = stdout
	 int 80h                                ; issue write syscall

	 nop
	 jmp _start		                ; repeat until no more data


;------------------------------------------------------------------------;
;  exit to system                                                                ;
;------------------------------------------------------------------------;
exit0000 mov eax,1		  ; exit syscall
;	 mov ebx,0		          ; return code zero

; further diagnostics...
	mov ebx, [readcnt]	; return count...
; we can display exit-code with "echo $?"

	 int 80h		                  ; exit

And the called subroutine:

Code: [Select]

; nasm -f elf32 readchar.asm

global readchar
extern readcnt

;--------------------------------------------------------------------------------------;
;  return one character at a time to the caller from stdin. char is      ;
;  returned in al. carry flag is set for error condition, cleared for       ;
;  normal completion a la FreeBSD.					            ;
;--------------------------------------------------------------------------------------;
readchar:
    push ebp
    mov ebp, esp

	 push ebx		               ; save caller registers
         push ecx
	 push edx

	;; compare the contents of readcurr and readlast
	 mov eax,[readcurr]	       ; get contents of readcurr

	;; make sure we read something on first call
	 test eax, eax
	 jz .readfil

	 cmp eax,[readlast]	               ; any data remaining in buffer?
	 ja .readfil		               ; no, issue read syscall

	 mov eax,[readcurr]	       ; load eax with addr of current byte in buffer
	 mov al,[eax]		               ; save byte at readcurr for caller
	 inc dword [readcurr]	       ; inc input buffer pointer
	 clc			                       ; indicate normal completion
	 jmp .readexi		               ; restore and return

	
.readfil mov eax,3		               ; syscall = read
	 mov ebx,0		               ; fd 0 = stdin
	;; get buffer and size from stack
	 mov ecx,[ebp + 8]	               ; address of buffer
	 mov edx,[ebp + 12]	               ; number of bytes to read (max)
	 int 80h	                       ; issue syscall

	 cmp eax,0		               ; did read return an error or eof?
	 jna .readerr		               ; yes, set cf and bail out

	 inc dword [readcnt]	       ; diagnostic counter of successful reads
	
	 ;; save the address of readbuff in readcurr
	 mov dword [readcurr], ecx

	 ;; save the address of the last input byte in readlast
	 add ecx,eax		
	 dec ecx		
	 mov dword [readlast],ecx

	 mov eax,[readcurr]	       ; load eax with addr of current byte in buffer	
	 mov al,[eax]		               ; save byte at readcurr for caller

	 ;; advance readcurr to next char in buffer
	 inc dword [readcurr]	       ; increment input buffer pointer
	
	 clc		                       ; clear carry flag to indicate normal completion
	 jmp .readexi		               ; go to exit
	
.readerr stc			               ; set carry flag to indicate error
	
.readexi pop edx		               ; restore input registers
	 pop ecx
	 pop ebx
	mov esp, ebp
	pop ebp
	 ret			                       ; return to caller

	
;-----------------------------------------------------------------------------------------------;
;  data for readchar procedure. Can I make these names local to readchar ;
;  even though they are in .data? OO assembly? Would be way cool!          ;
; yeah, kinda...
;-----------------------------------------------------------------------------------------------;	
section .bss
readcurr resd 1		  ; address of next available byte in
	                                  ; input buffer (readbuff)
readlast resd 1		  ; address of last available byte in
				          ; input buffer (readbuff)

We assemble both files separately, and link them together to form an executable, as indicated in the comments at the top of the first file. ("myfile" is a generic name I use - it's actually named "joetest3.asm" - "a rose by any other name..." and all

)

I've kept your "readcnt". I realize that it's only "temporary" - useful to keep track of what's going on while we try things, but not useful in the "finished product". I'm not ready to get rid of it yet.

I've made it a "global variable" - declared "global" in the file where it occurs, and "extern" in the file that (also) uses it. Without those declarations, it would be "undefined" in "readchar.asm", and Nasm would decline to assemble it. The linker figures out its address and fills it in where it's used in "readchar.asm".

I think pushing the parameters before calling "readfile" is obvious enough - we could easily enough push a file descriptor as a third parameter, if (when!) needed. When we get to the called routine, we've got a choice of how we find the parameters on the stack. We could reference them with respect to esp. [esp] would be the return address, [esp + 4] would be the first parameter (buffer), and [esp +8] would be the second parameter (size). But wait! You've pushed three registers, at four bytes each, so we have to add 12 to those numbers. That puts "buffer" at [esp + 16] and "size" at [esp + 20]. Not having to keep track of that is one of the advantages of doing it the "other way"...

Code: [Select]

push ebp  ; our caller was probably using it
mov ebp, esp
; at this point, ebp is "fixed" and can't be used as "general purpose"!

Now we can push registers all we want, and our parameters remain at [ebp + 8], [ebp + 12], [ebp + 16], etc.

Mmmm, I just realized that the code above is wrong! After pushing the parameters and calling "readchar" I should have "removed" the parameters from the stack...

Code: [Select]

    push buffalen
    push readbuff
	 call readchar 		  ; al will contain char if one is available
;    add esp, 8
; no, write it this way (easy to modify then we add "fd"!)
    add esp, 4 * 2  ; 4 bytes per parameter times two parameters...

I'll have to rework that and get back to ya!

Best,
Frank

JoeCoder · « **Reply #5 on:** June 22, 2011, 03:28:29 PM »

Hi Frank. Thanks so much for your help. You answered some of my questions before I asked like how to deal with referring to symbols outside the code etc. It's similar to how I am used to doing it but different enough I will have to pay attention.

As far as the stack offsets go I have a couple of questions for you but I will have to update tomorrow because I'm in the middle of something for work right now. One thing I wanted to ask, I would probably define equates to offset the stack (and use the *4 multiplier in the equates) to make things even more clear. Does that sound like a good idea in x86? We deal with this issue alot actually not because we have stacks (we don't) but because we often don't have storage layouts for stuff we have to work with because everything is closed source. After we (ahem) reverse engineer (i did not say that) what we need to we sometimes create a sparse list of equates that map the fields we need since we don't need everything in a big control block. I use a special naming convention for these fields so it's obvious in the code. I thought the same idea might work well in this case. And I was able to answer my question about making an equate for the insertion point in the message text. It works great. Ah, something that's almost the same between what I am used to working on and what I am trying to learn, equ is almost identical in both.

JoeCoder · « **Reply #6 on:** June 23, 2011, 08:30:08 AM »

Oops I think there are a few "issues" in the updated versions. Will that test eax,eax really be enough to work? I understand how it works for the first read but I don't understand how it will work for subsequent reads, since my code is based on the comparison between current and last and neither is ever 0. Don't know if it's because of the stack stuff we're still working out or not, but changing the buffer length to a small number like 5 to test the new read logic resulted in a tight i/o loop leading to me having to kill my VM and loss of my Linux userid! AFAIAC Linux (and probably all desktop os) have a long way to go until they get to the point where they are what they should be. I haven't seen any user code in more than 30 years able to mess up an MVS system but stuff like this that monopolizes the processor means you can't open a new terminal window to kill the process and there's nothing built into Linux to kill it automatically. And that was even using the debugger. It happened three times and caused an error in the debugger that led to this. Bummer! I guess I should have been more cautious when you said it wouldn't work and you would rework it and get back to me

Now I will try to understand what you did about saving ESP, I didn't get it the first time I looked at it.

Thanks,

Joe

Frank Kotler · « **Reply #7 on:** June 23, 2011, 09:41:58 AM »

Hi Joe,

I'm batting zero on this...

Code: [Select]

push buffalen
push readbuff
call readchar
add esp, 8

This balances your stack, but trashes the carry flag you're depending on! Oops! We can do the "add" with:

Code: [Select]

lea esp, [esp + 8]

without altering flags. I regret that I haven't gotten back to this. Honestly, I think I like your original version better than the "improvements" I've suggested!

The "test eax, eax" was added to fix another "improvement" I made - moving your variables to .bss instead of .data... where they do get initialized to zero (both of 'em)! Maybe the best thing is to move 'em back to .data, initialized as you had 'em - which assured a "read" on the first call.

On a "good day" I can write something of this length without typos or logic errors. On a "less good day", I make two typos and three logic errors per line. Sometimes I don't know which kind of day I'm having until I realize how screwed up my code is (although the typos should have given me a clue). On the bright side, you get to learn from my mistakes in addition to your own.

Seriously, you're doing quite well! I'll get back to this, but I can't promise when. Hopefully soon and a "good day"!

Best,
Frank

JoeCoder · « **Reply #8 on:** June 23, 2011, 11:14:29 AM »

Hi Frank, thanks for your time on this. I learned a lot from looking over what you wrote in comments and code and I don't have any specific technical questions at this point aside from what is the deal on saving esp in ebp but I was meaning to look at that this morning and got sidetracked until now so maybe I will figure that out soon. Most of my questions at this point are about why do things this way versus that way etc. I hope you and the rest of the guys can speak to those kinds of questions when you have time. Joe

Rob Neff · « **Reply #9 on:** June 23, 2011, 12:03:50 PM »

Quote from: JoeCoder on June 23, 2011, 11:14:29 AM

I don't have any specific technical questions at this point aside from what is the deal on saving esp in ebp

This is done mainly as a way to access function arguments and local stack variables using fixed offsets and to restore the stack pointer back to it's original value at the end of the function.

For example:

Code: [Select]

%define myfirstarg 8
%define mystackvar 4

myfunc:
    push ebp
    mov  ebp, esp
    sub   esp, 32    ; create some local variable space on the stack
    mov  eax, dword [ebp + myfirstarg]   ; get first arg contents
    mov  dword [ebp - mystackvar], eax  ; copy it
    
    .
    .   ; do other stuff
    .

    ; restore stack frame and register before exiting
    mov  esp, ebp
    pop  ebp
    ret

This shell code is basically how most calling conventions work on x86. On x64 it's only slightly more complicated as args are passed in registers.
Hope that helps.

JoeCoder · « **Reply #10 on:** June 23, 2011, 12:52:17 PM »

Hi Rob, thanks for helping with my question. I think I understand the part about offsetting from the stack, what I don't get is why we have to save esp on the way in in this particular example. Is it only so we don't have to constantly adjust our offsets to stack references if we push and pop stuff in our own routine? So there is no need to save esp in this specific piece of code?? Or is there something else I'm missing?

Frank Kotler · « **Reply #11 on:** June 23, 2011, 04:09:25 PM »

Quote

I haven't seen any user code in more than 30 years able to mess up an MVS system.

Let me have a shot at it!

Seriously, sorry you had trouble with it. Pushing the parameters and not removing them (in a loop like this) would cause the stack to grow (downward) continuously. Didn't cause a problem in my "small" tests, but would eventually. Trying to clean up the stack with an "add" would mean you'd never get the carry flag, so would go on forever. My bad!

Here's today's version. No guarantee it's any better.

Code: [Select]

; nasm -f elf32 myfile.asm
; nasm -f elf32 readchar.asm
; ld -o myfile myfile.o readchar.o

;    invoke as  ./test < inputfilename
;    or just ./test
;    if you use it with console stdin then type control d to end input
;

extern readchar		; the function we're testing
global readcnt		; diagnostic

section .data

; yeah, we can do it this way...
; but if someone inserts lines "writmlen" will still suffer...

writmsg  db    "Char was: ", 0,10
the_char equ writmsg + 10
writmlen equ   $-writmsg	 

readcnt  dd    0		          ; diagnostic, will remove
	

section  .bss
buffalen equ    8192		          ; 8k

readbuff resb  buffalen		  ; input buffer area


section  .text
         global _start
;------------------------------------------------------------------------;
;  test readchar routine.		                                   ;
;------------------------------------------------------------------------;
_start:  nop

        push buffalen
        push readbuff
	call readchar 		  ; al will contain char if one is available
        lea esp, [esp + 4 * 2]    ; instead of "add" - to preserve flags
	jc exit0000  		  ; carry flag will be set in readchar
			          ; if error condition or EOF

	 mov ecx,writmsg	                ; addr of text to write
         mov byte [the_char],al			; move char to output message

	 mov edx,writmlen	                ; number of bytes to write
	 mov eax,4		                ; syscall = write
	 mov ebx,1		                ; fd 1 = stdout
	 int 80h                                ; issue write syscall

	 nop
	 jmp _start		                ; repeat until no more data


;------------------------------------------------------------------------;
;  exit to system                                                                ;
;------------------------------------------------------------------------;
exit0000 mov eax,1		  ; exit syscall
;	 mov ebx,0		          ; return code zero

; further diagnostics...
	mov ebx, [readcnt]	; return count...
; we can display exit-code with "echo $?"

	 int 80h		                  ; exit

And...

Code: [Select]

; nasm -f elf32 readchar.asm

global readchar		; our function
extern readcnt		; diagnostic - to be removed!

;--------------------------------------------------------------------------------------;
;  return one character at a time to the caller from stdin. char is      ;
;  returned in al. carry flag is set for error condition, cleared for       ;
;  normal completion a la FreeBSD.					            ;
;--------------------------------------------------------------------------------------;
readchar:
        push ebp		; save caller's ebp
        mov ebp, esp		; create a "stack frame"

	 push ebx		               ; save caller registers
         push ecx
	 push edx

	;; compare the contents of readcurr and readlast
	 mov eax,[readcurr]	       ; get contents of readcurr
	 cmp eax,[readlast]            ; any data remaining in buffer?
	 ja .readfil	               ; no, issue read syscall

	 mov eax,[readcurr]	       ; load eax with addr of current byte in buffer
	 mov al,[eax]	               ; save byte at readcurr for caller
	 inc dword [readcurr]	       ; inc input buffer pointer
	 clc	                       ; indicate normal completion
	 jmp .readexi	               ; restore and return

	
.readfil mov eax,3		               ; syscall = read
	 mov ebx,0		               ; fd 0 = stdin
	;; get buffer and size from stack
	 mov ecx,[ebp + 8]	               ; address of buffer
	 mov edx,[ebp + 12]	               ; number of bytes to read (max)
	 int 80h	                       ; issue syscall

	 cmp eax,0		               ; did read return an error or eof?

;	 jna .readerr		               ; yes, set cf and bail out
; whups! This almost got by me. "jna" ("above") is for unsigned cmp
; if error, 0FFFFFFFFh or so, this is "above" zero, so no jump!
; for a signed comparison, so that -ERRNO is less than zero, use "jng".
; error is unlikely in this case, but best to get it right.

	 jng .readerr		               ; yes, set cf and bail out

	 inc dword [readcnt]	       ; diagnostic counter of successful reads
	
	 ;; save the address of readbuff in readcurr
	 mov dword [readcurr], ecx

	 ;; save the address of the last input byte in readlast
	 add ecx,eax		
	 dec ecx		
	 mov dword [readlast],ecx

	 mov eax,[readcurr]	       ; load eax with addr of current byte in buffer	
	 mov al,[eax]	               ; save byte at readcurr for caller

	 ;; advance readcurr to next char in buffer
	 inc dword [readcurr]	       ; increment input buffer pointer
	
	 clc	                       ; clear carry flag to indicate normal completion
	 jmp .readexi	               ; go to exit
	
.readerr stc		               ; set carry flag to indicate error
	
.readexi pop edx	               ; restore input registers
	 pop ecx
	 pop ebx

; destroy the stack frame - not strictly necessary here
; but if we had local variables, this would "free" them,
; and guarantee that esp was ready for the pop ebp and ret
	mov esp, ebp
	pop ebp
	 ret	                       ; return to caller

	
;-----------------------------------------------------------------------------------------------;
;  data for readchar procedure. Can I make these names local to readchar ;
;  even though they are in .data? OO assembly? Would be way cool!          ;
; yeah, kinda...
;-----------------------------------------------------------------------------------------------;	
section .data
readcurr dd 1		  ; address of next available byte in
                          ; input buffer (readbuff)
readlast dd 0		  ; address of last available byte in
		          ; input buffer (readbuff)

Proceed at your own risk, but I think it's better... Note that I found a minor glitch in your original code. See the comments.

Best,
Frank

Rob Neff · « **Reply #12 on:** June 23, 2011, 04:13:16 PM »

Register ebp is saved as it is considered non-volatile when interfacing with other functions. All called functions must not modify it's value without restoring it back before returning. Unlike other registers opcodes generated using it are assumed to be offsets within the STACK segment just like esp. That's why it makes a nice little placeholder for the constant offsets. Today, of course, we mostly program in the flat memory model where the segment registers cs/ds/es/ss most likely all point to the same starting region in memory anyways. esp will be modified by pushes/pops/etc during the course of your function and thus can make it extremely difficult to reference your function arguments and local variables.

Now, with that said, you CAN avoid it's use altogether if you're writing small tightly-coded functions. As a matter of fact, on x64 the rsp register is normally optimized to a constant offset which is then used in place of rbp.

A 32-bit example of my previous example:

Code: [Select]


%define MYLOCALSTACKSIZE 32
%define myfirstarg MYLOCALSTACKSIZE+4
%define mysecondarg MYLOCALSTACKSIZE+8
%define mystackvar1 0
%define mystackvar2 4

myfunc:
    sub   esp, MYLOCALSTACKSIZE    ; create some local variable space on the stack

    ; CAUTION: any pushes from this point forward can throw your stack offsets out of whack...

    mov  eax, dword [esp + myfirstarg]   ; get first arg contents
    mov  dword [esp + mystackvar1], eax  ; copy it
    
    .
    .   ; do other stuff
    .

    ; restore stack frame
    add  esp, 32
    ret

As you can see, we can avoid the use of ebp completely but now must be aware of any stack altering opcodes that may interfere with our offsets. Pay close attention to the differences between this example and the previous one!
Without being cautious you can get bit when setting up the stack with parameters to a function you want to call. Of course, if you are only calling your own functions you are free to create your own calling convention.

Edited to correct param offsets since we no longer push ebp...

JoeCoder · « **Reply #13 on:** June 23, 2011, 05:02:28 PM »

Quote from: Frank Kotler on June 23, 2011, 04:09:25 PM

Quote
I haven't seen any user code in more than 30 years able to mess up an MVS system.

Let me have a shot at it!

I'm serious, I really don't know of any way to do it. There are so many protections built in because of what's at stake. Downtime is extremely expensive and those systems were designed from the beginning with reliability as a cornerstone of everything. They were not without problems in the beginning but they cleaned up fast and now they're virtually uncrashable. Like I said I haven't ever heard of it happening in 30 years and that's working at software companies where we work with many sites and review alot of dumps, so we know what went wrong. There were subsystems like CICS that had some exposures, but they were mostly cleaned up in the late 1980s and anyway the problems they had didn't bring down the OS. It's now to the point that even many vendor (system) code problems don't crash the machine, the recovery and management are *that* good.

If you want to fool around with a 1970's era system there is always Hercules and the version 3.8 that IBM put out in the public domain. I don't know how hard that is for a user program to crash but if you can't crash that you certainly can't break anything newer. Kind of a pain to work with though. Newer versions are much nicer. And if you have time to kill and want to try your hand at MVS assembler (best damn operating system and assembler ever written!) there is a pretty cool Java emulator at www.z390.org. The guy who wrote it has been doing stuff like this since the 1970's, he actually used to have a simulator and cross-assembler from the PC to IBM 370's! Anyway you can't run more than one program at a time on that and you don't have access to most of the systems services, but it will assemble a program and run it and it's fun and maybe a good learning tool for people who can't afford a real system (and that is everybody).

Quote from: Frank Kotler on June 23, 2011, 04:09:25 PM

Seriously, sorry you had trouble with it. Pushing the parameters and not removing them (in a loop like this) would cause the stack to grow (downward) continuously. Didn't cause a problem in my "small" tests, but would eventually.

No problem. I am just not happy with the state of desktop OS development, this should never be able to happen. My test was to shorten the buffer so I could overrun it and cause another read, and that's when the debugger went nuts and the system started looping so hard I couldn't kill the process. That is why I have been developing in a VM, even though it's just user code. I don't trust this stuff and here's a good example why not! If I had done that on my main desktop I would not be a happy camper but I still wouldn't have blamed you for trying to help me.

Quote from: Frank Kotler on June 23, 2011, 04:09:25 PM

Trying to clean up the stack with an "add" would mean you'd never get the carry flag, so would go on forever. My bad!

Again, thanks and no problemo. I'm sensitive to those issues because we try to get some pipelining going by knowing what opcodes set condition codes so we can overlap a little in epilog. I have been scanning the AMD doc to see if any of the opcodes I used affected rflags but in the end I was able to get everything out of the way.

Quote from: Frank Kotler on June 23, 2011, 04:09:25 PM

Here's today's version. No guarantee it's any better.

Thanks, I'll go over it tomorrow hopefully before I get lost for the weekend.

One thing I did see in your comments real quick:

Quote from: Frank Kotler on June 23, 2011, 04:09:25 PM

; jna .readerr ; yes, set cf and bail out
; whups! This almost got by me. "jna" ("above") is for unsigned cmp
; if error, 0FFFFFFFFh or so, this is "above" zero, so no jump!
; for a signed comparison, so that -ERRNO is less than zero, use "jng".
; error is unlikely in this case, but best to get it right.

Thanks, that's important and I missed it. We have those issues also but I know what instructions to use. On x86 I don't, and I need to keep an eye on that. For us it's actually a little easier since the extended mnemonics for branching aren't different between signed and unsigned although the register math certainly is and needs to be used right. Thanks for not letting it slip by, it would have come back to haunt me later.

and

Quote from: Frank Kotler on June 23, 2011, 04:09:25 PM

; destroy the stack frame - not strictly necessary here
; but if we had local variables, this would "free" them,
; and guarantee that esp was ready for the pop ebp and ret
   mov esp, ebp
   pop ebp
    ret    ; return to caller

Thanks, that's what I finally concluded, but I wanted to ask because at first I was sure I was missing something.

What I will probably do is define equates for all the offsets from the stack pointer to avoid magic numbers. I don't know if people go that far but I am used to trying to be as defensive as I can be.

Thanks again Frank.

Joe

JoeCoder · « **Reply #14 on:** June 23, 2011, 05:05:06 PM »

Thanks Rob. My confusion is just in this particular example why did we have to preserve esp and the answer is we didn't. I understand it's good practice and thanks for telling me we're not allowed to change it by convention. I assume all that until I know otherwise, so no problem there. Thanks for your help and code examples!

NASM - The Netwide Assembler

News:

Author Topic: Bash my code. It works but is it done right? (Read 23949 times)

JoeCoder

Bash my code. It works but is it done right?

Frank Kotler

Re: Bash my code. It works but is it done right?

JoeCoder

Re: Bash my code. It works but is it done right?

JoeCoder

Re: Bash my code. It works but is it done right?

Frank Kotler

Re: Bash my code. It works but is it done right?

JoeCoder

Re: Bash my code. It works but is it done right?

JoeCoder

Re: Bash my code. It works but is it done right?

Frank Kotler

Re: Bash my code. It works but is it done right?

JoeCoder

Re: Bash my code. It works but is it done right?

Rob Neff

Re: Bash my code. It works but is it done right?

JoeCoder

Re: Bash my code. It works but is it done right?

Frank Kotler

Re: Bash my code. It works but is it done right?

Rob Neff

Re: Bash my code. It works but is it done right?

JoeCoder

Re: Bash my code. It works but is it done right?

JoeCoder

Re: Bash my code. It works but is it done right?