Well... "the stack" is a perfectly ordinary piece of memory, distinguished only by the fact that ss:sp (or esp or rsp) point to it. You probably know that the stack works downward, from high addresses in memory to low addresses. The memory isn't "upside down" or anything, but the instructions that implicitly use memory - call, ret, push, pop - decrement sp (or esp or rsp) by 2 (or 4 or 8) when putting something "on" the stack (call and push) and decrement sp/esp/rsp by 2/4/8 when taking something "off" the stack...
; prints the value of ax as decimal, hex, and binary ascii
; nasm -f bin -o showax.com showax.asm
org 100h
Since we're a .com file, dos picks a single segment (lowest one available, usually) for our entire program. The first 256 (100h) bytes are the "Program Segment Prefix" (PSP). This isn't part of our on-disk .com file - dos creates it as it loads us. The first two bytes are 0CDh, 20h - int 20h - the "return to dos" interrupt. There are some other "interesting" parts of the PSP (segment address of our environment variables, for example)...
The stack is at the top of our one-and-only segment. Dos sets sp to zero and pushes a zero, so our stack looks like:
xxxx:FFFE 0000
section .data
variable dw 12345
section .text
mov ax, [variable] ; "[contents]" of variable
call ax2dec
The "call" puts the return address (106h in this case) on the stack...
xxxx:FFFE 0000
xxxx:FFFC 0106
... and execution continues at the new address. When we get to the "ret" in our subroutine, the return address is removed from the stack (we're back to just the zero), and execution continues from "call newline".
In the subroutine, I save and restore all the registers that I alter. This isn't necessary - it is usual to return a "meaningful value" in ax, other registers can be preserved (bp, bx, si, and di usually) or trashed. Since I wanted to be able to use this for debugging purposes with minimal side-effects, I save 'em all - pusha and popa could have been used (shorter but slower, probably), but I did it this way...
ax2dec:
push ax
xxxx:FFFE 0000 ; from dos
xxxx:FFFC 0106 ; return address from subroutine
xxxx:FFFA 3039 ; old ax - 12345 decimal
push bx
xxxx:FFFE 0000 ; from dos
xxxx:FFFC 0106 ; return address from subroutine
xxxx:FFFA 3039 ; old ax - 12345 decimal
xxxx:FFF8 0000 ; old bx - probably zero?
push cx
xxxx:FFFE 0000 ; from dos
xxxx:FFFC 0106 ; return address from subroutine
xxxx:FFFA 3039 ; old ax - 12345 decimal
xxxx:FFF8 0000 ; old bx - probably zero?
xxxx:FFF6 ???? ; old cx
push dx
xxxx:FFFE 0000 ; from dos
xxxx:FFFC 0106 ; return address from subroutine
xxxx:FFFA 3039 ; old ax - 12345 decimal
xxxx:FFF8 0000 ; old bx - probably zero?
xxxx:FFF6 ???? ; old cx
xxxx:FFF4 ???? ; old dx
Now we can use these registers for our own purposes, and restore 'em to their original values later...
mov bx, 10 ; divide by ten
xor cx, cx ; zero our counter
.push_digit:
xor dx, dx ; clear dx for the div
div bx ; dx:ax/bx -> ax quotient, dx remainder
push dx ; save remainder
Since we get these remainders in the opposite order from which we eventually want 'em, pushing them on the stack and popping them off is a convenient way to reverse the order (there are other ways to do this). When we're done looping through this section, the stack will look like this:
xxxx:FFFE 0000 ; from dos
xxxx:FFFC 0106 ; return address from subroutine
xxxx:FFFA 3039 ; old ax - 12345 decimal
xxxx:FFF8 0000 ; old bx - probably zero?
xxxx:FFF6 ???? ; old cx
xxxx:FFF4 ???? ; old dx
xxxx:FFF2 0005 ; first remainder
xxxx:FFF0 0004 ; etc.
xxxx:FFEE 0003
xxxx:FFEC 0002
xxxx:FFEA 0001
At this point, the quotient in ax is zero...
inc cx ; bump digit counter
or ax, ax ; is quotient zero?
jnz .push_digit ; no, do more
mov ah, 2 ; print character subfunction
.pop_digit:
pop dx ; get remainder back
Now we get the digits back in the "proper" order, one at a time. Note that we don't need to pop the same register that we pushed - it just works out that way for int 21h/2. If we wanted to use "stosb" or "int 29h" we could have used "pop ax" just as well. Also note that we're only interested in a single byte, but we can't push/pop a single byte - only 2 bytes (or 4 or 8 in 32- or 64-bit code - there are some options to this, we can actually push/pop 4 bytes in 16-bit code or 2 bytes in 32-bit code. I think we can still push 2 bytes, but not 4 bytes, in 64-bit code. Better to stick to the "native" size, to avoid confusion, IMO). In any case, once we've popped the cx digits we counted while pushing them, the stack is back to:
xxxx:FFFE 0000 ; from dos
xxxx:FFFC 0106 ; return address from subroutine
xxxx:FFFA 3039 ; old ax - 12345 decimal
xxxx:FFF8 0000 ; old bx - probably zero?
xxxx:FFF6 ???? ; old cx
xxxx:FFF4 ???? ; old dx
add dl, '0' ; convert to ascii character
int 21h ; print it
loop .pop_digit ; cx times
pop dx
xxxx:FFFE 0000 ; from dos
xxxx:FFFC 0106 ; return address from subroutine
xxxx:FFFA 3039 ; old ax - 12345 decimal
xxxx:FFF8 0000 ; old bx - probably zero?
xxxx:FFF6 ???? ; old cx
...and dx is back to its original value...
pop cx
pop bx
pop ax
etc...
xxxx:FFFE 0000 ; from dos
xxxx:FFFC 0106 ; return address from subroutine
At this point, the next thing on the stack had damwell better be our return address, or we're "off in the weeds" - probably crash!
ret
... and we're back to where we were called from, with just the zero that dos put on the stack.
I don't save/restore ax in the other subroutines, since after being rotated the correct number of times it's back to its original value anyway. I've probably made typos in the above, but it should give the general idea...
The stack is also used for passing parameters. I may get to that... if I get to it. :)
Best,
Frank