It might be easier to understand the two kinds of "ret" by looking at how they're called, first. As you probably know, there are different "calling conventions" - agreed upon rules for interfaces with library (and other) code. In "cdecl", used in the interface with the C library and other places, it is the caller's responsibility to "clean up the stack" or "balance the stack" by "removing" the parameters that were pushed there.
push 42
push fmt_string ; pretend it's db "%d", 0
call _printf
add esp, 8 ; "clean up stack" - 2 parameters, 4 bytes each
We don't need to see the _printf code to know it ends with a plain "ret". As you can see, nothing is actually "removed" from the stack - the parameters are still there - we've just moved esp (the "stack pointer") above them. The next time the stack is used, they'll be overwritten.
In the "stdcall" convention, used by Windows APIs (and other places), it is the callee's responsibility to "clean up the stack".
push 0 ; msgbox style
push caption
push message
push 0 ; hWin
call MesssageBoxA
; we don't need to "clean up stack"
; the parameters have been "removed" for us
We don't get to see the code for MessageBoxA, but we know it ends in "ret 16". The operand is the number of bytes to "remove" from the stack - not the number of parameters but the number of bytes (same as what we added in the cdecl example). As you can imagine, this doesn't work with functions with a variable number of parameters - like _printf.
Which one you should use to end your subroutines depends on which convention you used to call 'em.
Besides who gets to "clean up stack", the calling conventions specify which registers need to be preserved (ebp, ebx, esi, and edi) and which ones can be altered (ecx and edx - eax is the return value). Back in the good old days of 16-bit code, addressing modes were quite limited - [bx] and [bp] were the only two "base registers" that could be used - [sp] was not a valid effective address. bp was also special in that it defaulted to using ss as a segment register to form the complete address. So if we wanted to access parameters passed on the stack, we didn't have much choice but to create a "stack frame" using bp.
myfunc:
; create a stack frame
push bp ; save caller's register
mov bp, sp ; set the "frame pointer" to current sp
sub sp, 16 ; make room for "local" variables
mov ax, [bp + 4] ; get last parameter pushed
mov [bp - 6], ax ; store it in a local variable (for no reason)
; destroy the stack frame
mov sp, bp ; "free" local variables and restore sp
pop bp ; restore caller's reg
ret
In 32-bit code, any register can be a "base" register, and all the segment registers point to the same memory, so ebp isn't as "special" as it was, but it it still used as the "frame pointer". We don't even need to set up a stack frame at all.
myfunc:
mov eax, [esp + 4] ; get last parameter pushed
...
But it is still convenient to use a stack frame. It allows a debugger to "back trace" through a call chain, and it makes local variables easier to use (we could use locals indexed off esp, but it would soon drive us crazy keeping track of 'em). Compilers usually do it unless asked not to.
myfunc:
push ebp ; our caller was using this
mov ebp, esp ; set our frame pointer
sub esp, 4 ; room for one local variable
mov eax, [ebp + 8] ; last-pushed parameter
push esi ; convention expects us to preserve this
mov esi, [ebp + 12] ; next parameter
add eax, esi
mov [ebp - 4], eax ; store in local variable
pop esi ; restore caller's esi, per convention
xor eax, eax ; pretend we did something useful with eax
mov eax, [ebp - 4] ; get eax back from local
; note that parameters and locals are at the
; same offset from ebp, whether we've put
; something on the stack or not.
; destroy stack frame
leave
; does the same thing as:
; mov esp, ebp ; restore esp to where it was
; pop ebp ; restore caller's ebp
ret
As shown, "leave" is a shorter way of doing "mov esp, ebp" and "pop ebp". You might see either. If you haven't got local variables (and didn't butcher the stack), you don't need the "mov esp, ebp" - esp was there anyway.
There's a matching "enter" instruction:
myfunc:
enter 4, 0
; does the same as:
; push ebp
; mov ebp, esp
; sub esp, 4
...
You won't often see it, because while it's smaller, it's slower. What's that second operand? Don't ask! It's a "lex level" - allows a function to access its caller's variables (and its caller's caller's variables, etc.). I think Pascal uses it. I never have. Just make it zero.
Best,
Frank