Most of the time the answer is NO, specially when dealing with libraries like libc, libm or some other made to be use in C language. Why not?
Lots of standard (ISO 9899) functions are "intrinsic", meaning the compiler knows how to optimize them, avoiding function calls. One example is printf. This call:
printf( "Hello\n" );
Is translated, again, most of the time, to a single call to fputs or write (or a variant called io_write) function, which are faster! To call the printf here is really slow and the compiler knows it.
There are other examples: abs(), for instance, usually is translated not to call, as we can see below:
; int f( int x ) { return abs(x); }
f:
mov eax,edi
neg eax
cmovs eax,edi
ret
So, to call abs() -- present on libc, is superfluous.
And, as I said before, good C compilers (GCC, CLANG, Intel C++), avoid some penalties for performance which the average assembly programmer don't pay any atention (branch mis-predictions, uneccessary data propagation, cache misses, over usage of the stack...).
When assembly is a good idea then? Well... when the high level language compiler don't do a good job. This happens sometimes, specially on not well designed code. I tend to think abour assembly only in termos of performance. If your C code can be improved a lot (more then 100%, as an example), then -- and only then -- assembly can be a good ideia.
Another area is where it is difficult to do something in pure C. Let's say we want to set the direction flag and move some block of data backwards. Using ISO 9899 C this is not possible using memcpy() or movemem(). In most modern C/C++ compilers this isn't possible as well. Assembly can be the answer.
There is also another usage for assembly: To make your routines shorter (optiimization of size -- not performance). This is useful, but, again, I think the best usage for assembly is to think always about performance. But, beware: most of the time your assembly code is SLOWER than the equivalent routine made in C. There is only one way to be sure about the gains: MEASURE YOUR ROTINES.
Here's an example: Suppose you want to move a block of data from one buffer to another. We have 2 pointers and a size as argument. In C, the best way to do it, if the buffers don't overlap, is to use memcpy() function. Most of the time your compiler will do a function call and you may think this will slow down your routines a bit, but consider the alternatives:
1 - You can create a simple loop, moving sub-blocks of data individually;
2 - You can use rep/movs (byte, word, dword or qword)
Like:
void move1( int *dest, int *src, unsigned int elems )
{ while ( elems-- ) *dest++ = *src++; }
void move2( int *dest, int *src, unsigned int elems )
{
__asm__ __volatile__ (
"rep; movsd" : : "D" (dest), "S" (src), "c" (elems) : "memory"
);
}
Here I'm movind one DWORD at a time. If you MEASURE this 2 routines against memcpy( dest, src, elems * sizeof( int ) ); you'll the latter one is way faster than move1() and move2().
This summarize my adivce, based, of course, in my experience and experiments: When mixing code created by good C compilers and assembly, avoid to try to recreate the function calls in assembly thinking your assembly code will be faster than the created by the high level compiler. This is not the case in the majority of the cases! Reserve usage for assembly only to those cases where the compiler surely don't do a good job (and only after MEASURING the time spent by the routines).
[]s
Fred