Lemme deal with part 1 first. I see I made some assumptions that were apparently not correct...
; need a buffer to read into in ecx
mov ecx, input_buffer 1. What exactly is going into ecx here?
"input_buffer" is the address (the "offset" part of the address - in "flat memory model", which Windows and Linux are, the "base" part of the address is zero) of the buffer, which you had already declared.
What value does input_buffer have at this point?
The "[contents]" of the buffer are supposedly "uninitialized", but are in fact initialized (by the OS when it loads our program) to zeros - 200 of 'em.
; and the (maximum) count
mov edx, 200 2. Why is 200 declared here, if 200 bytes is already declared for input_buffer?
We're telling sys_read how many bytes to read (maximum), that is, how much space is available in the buffer... including the linefeed (Enter key) which terminates sys_read. Although we have asked to reserve 200 bytes, sys_read doesn't know this until we tell it - here. If the pesky user types more than what we've allowed in edx, the excess remains in the OS's input buffer - I think of it as the "keyboard buffer" - and will screw up the next read, ours or the shell's. It is safer to flush this - look for the linefeed and if we don't see it, read more into a "dummy" buffer and throw it away until we do see the linefeed. Since the assignment specified "not more than 200 bytes", I didn't do this. Safer to do it!
mov esi, input_buffer 3. Not too familiar with esi and edi instructions yet.. what are these two lines of code doing?
mov edi, output_buffer
I asked if you knew the "string" instructions, and then didn't wait for an answer. The "string" instructions - lodsb, stosb, movsb, scasb, cmpsb, insb, and outsb - and their "w" and "d" and "q" friends (I think that's all of them) all use (e|r)si as a source, and es:(e|r)di as a destination. Since we're in "flat" memory model, ds: and es: refer to the same memory, so you don't need to worry about the "es:" part (in real mode you do). "lodsb" loads al from [esi] and advances esi to point to the next byte. "stosb" stores al in [edi] and advances edi to point to the next byte. "lodsw" loads ax and advances esi by two bytes, etc, etc, etc. Here, we're just setting up the "source index" and "destination index" for future use.
mov ecx, eax ; count of bytes typed 4. Where are the values for eax coming from? Because when I look at eax above it holds the value "3" for the sys call read argument
Right. But when sys_read returns - when it sees the Enter key being hit - it holds the number of bytes actually read... or an error number. By rights, it is always wise to check for an error! Since reading and writing to stdin and stdout are "unlikely" to encounter an error, I skipped that part and ASSumed no error... thus, bytes read. If you were reading/writing to a disk file, for example, it would be important to check for errors (a negative number between -1 and -4095. "man 2 read" claims that it returns -1 and the actual error number is in "errno", but that's the "C wrapper" - we get the negative of the error number in eax. "errno.h" has the numbers )
top:
; get a byte/character into al
lodsb 5. What does lodsb do in this case?
Loads al from [esi] (in input_buffer) and advances esi to point to the next byte.
push eax ; save a copy
; isolate high nibble
shr al, 4
xlat ; alias of xlatb 6. I don't see the xlat command in my instructor's slides, what is it doing here?
Essentially, mov al, [ebx + al]. There's no such instruction - the sizes don't match - but that's about what xlat does. If it does not appear in your instructor's slides, perhaps you're not supposed to be using it. There are other ways, but it may be too late!
stosb ; put it in output buffer 7. are you referring to the output_buffer variable, if so, how is this instruction doing that?
It's the other one of those "string" instructions that we're using. Moves al - now "translated" to a hex digit - to [edi], which we pointed to "output_buffer", and advances edi to the next byte.
This reminds me of an important point that I haven't mentioned! In the "flags register" - where the zero flag, carry flag, etc. live - there's a "direction flag" whose function is to control the direction of the "string" instructions. If the flag is set - the "std" instruction - the "string" instructions work "down". That is, esi or edi will be decremented to point to the previous byte. It is considered rude to leave this flag pointed "down". It is wise to do "cld" to clear the direction flag so we go "up". I never set it "down" so I ASSumed it was set "up". This is a fairly serious error in the code I posted! Really should do "cld" before using the "string" instructions... unless you want to work "down" which is sometimes useful. My bad!
add edx, 3 9. Why storing 3 in edx?
We're not "storing" it, we're adding it. We zeroed edx at the beginning of this loop, and we've added 3 bytes to our output buffer - the two hex digits and the space. When we're done, this will be the number of bytes to print (sys_write).
loop top ; do 'em all
This us the same as:
dec ecx
jnz top
You probably knew that one...
mov al, 10 ; linefeed 9. What is going on here?
stosb
inc edx
We're just adding a linefeed to the end of the buffer, and counting it. Just for a neater display - without it the shell prompt would be on the same line as our output.
; now print it out
mov eax, 4 ; sys_write
mov ebx, 1 ; stdout
mov ecx, output_buffer 10. How is the data actually getting passed into output_buffer?
We put it there with the "stosb"s.
; edx should be all set
... 'cause we counted 'em!
I'll get to "part 2" soon. Remind me if I don't.
Best,
Frank