To sumarize my argument against the use of C functions in an assembly "standalone" code, here's how C does its magic:
The main() function isn't the "main function". There is a lot of preparation before main() is called. There is no "dynamic allocable" space in an standalone assembly program, the structure and how it works must be done before main() starts. The same thing happens when you try to use functions as printf() or scanf(): C is locale aware and default locale information must be loaded before main(). Notice that printf() and scanf() uses, implicitly, the streams sdtout and stdin, respectively, and I'm not talking about the file descriptors 0 and 1, but pointers to FILE opaque structure. They must be initialized before main(). I think you got the idea: A lot of initialization is made before main() is called and a lot of finalization code runs after main() returns.
Take that and some more "excentric" things C functions must be aware of: signals, for instance and you'll begin to understand how slow are some "useful" C functions.
Am I saying that printf() is statically linked to your code? NO! Nowadays the standard C library is dynamically linked, even if early bound... BUT, all the state kept by the library is allocated in YOUR process address space, and all the library initialization and finalization is done there as well.
This means YOUR code is always called by the "C Runtime Library", usually a simple object file (like crt0.o in UNIX systems) or some code inside imported static libraries like msvcrt.lib.
There's no way an assembly code using a C library function could not use these initialization/finalization hidden codes... and, to use them, you must obey the operating system, and C libraries, ABI (Application Binary Interface). This means, for example, obeying stack pointer aligment before calling a function, pushing arguments in backward order (i386, for example), restoring the stack pointer after the function call (cdecl calling convention), etc... But also, to preserve EBX, ESI, EDI and EBP (i386 ABIs for Windows and Unixes) inside your functions (the caller assumes these registers aren't changed)... More: EAX (and, maybe EDX) is always used to return integer values from functions (including pointers), but EAX, EDX and ECX are completly free to be changed (you cannot rely on they previous values before calling a function, except if they are result values).
Take printf() as an example: The function's prototype is:
int printf( const char *fmt, ... );
It means EAX is always changed by the routine, and, probably, ECX and EDX too. Since EBX, ESI, EDI and EBP are always preserved, you can rely on them, but bot EAX, ECX and EDX...
The entire point to do an assembly "pure" program is to do things run FASTER. The second point, less important, is to create SMALL routines. IMHO, if you use C functions because it is an easy thing to do, you are missing the point of what assembly programming is. A function like printf() is generic enough to be real slow... To print a simple 5 characters string takes, more or less, 200000 (200 thousands) cycles of clock. Compare this to a simple use of a system call on Linux, which is 20 times faster (still slow, but faster than printf).
BTW, C compilers tend to create better code than you do by hand. Take a simple function as an example:
_Bool is_divisible_by_4( int x ) { return ( x % 4 ) == 0; }
A less experienced programmer can thing about converting this to:
_is_divisible_by_4:
push ebp
mov ebp,esp
mov eax,[ebp+8]
xor edx,edx
mov ecx,4
idiv ecx
mov eax,0
test edx,edx
setz al
pop ebp
ret
There are several things wrong with this:
1) Prolog/Epilog aren't necessary since the 80386;
2) IDIV is SLOW... On old processors (before Nehalem microarchiteture, at least) the instruction takes 100~200 clock cycles. On processors up to Tiger Lake microarchitecture it takes 30~40 cycles; And on modern processors, more or less, 17 cycles.
3) Division isn't necessary to get the remainder here...
Here's an "pure" assembly way of thinking in action:
; Input: EAX = signed int
; Output EAX=1 (divisible) or 0 (not divisible)
; OBS: Any integer divisible by 4 has the first 2 bits zeroed!
_is_divisible_by_4:
and eax,3 ; keep only bits 0 and 1 and affect ZF.
setz al ; set AL to 1 if ZF=1, 0 if ZF=0.
ret
If you insist on using cdecl for i386 just add a mov eax,[esp+4] before the and instruction.
My argument is: There's no advantage using standard library C functions in your assembly code. The entire "runtime" must run for them to work properly. This is completely different from create a C function that don't use standard C library functions and calling them from an assembly routine. But this is usually interesting done with code compiled as freestanding, not hosted.
So, what you can do if converting an integer to a string (or a floating point) is what you need? Do your conversion routines... They are easy to do (specially integers) and uses way less space then the standard, generic, routines - You just have to think as an assembly programmer, not C.