Well, BIOS interrupts are 16-bit, so would work in DOS, DosBox, or "my own OS" (until you switch to pmode). For a "real OS" (protected from US!), we need to talk to the OS - WriteFile or WriteConsole (heck, MessageBoxA would do it) for Windows. I use sys_write in Linux (should work for OSX too). I get the impression you're using Windows... which I know less about...
In the method you show, we push the remainders on the stack and pop 'em off to get 'em in the "right order", since they appear rightmost first and we print leftmost first. Instead, we could put 'em in a buffer, starting at the "top" or right end of the buffer and work back to the "beginning". We may want to start with a zero (the number 0, not the character '0') to make a zero-terminated string. When we run out of digits (quotient is zero) we probably aren't at the "beginning" of the buffer. We could save this position as the "start print"
position, or we could space-pad to the beginning of the buffer (right-justified numbers look better if we're printing a column of numbers). Once we've got 'em in a buffer, it's pretty much "hello world".
To eliminate the slow "div"... Hmmm... I think "Brethren" has got an example in the "example code" section here. I'll look for that. "Terje's method" (Terje Mathisen) may be better - the AMD optimization manual has that (don't credit him - maybe they developed it independently) and I think Agner Fog's work uses it. I'm gonna have to "get back to you" on that... Remind me if I don't...
Later,
Frank