NASM - The Netwide Assembler

NASM Forum => Example Code => Topic started by: TightCoderEx on June 17, 2012, 01:34:20 AM

Title: Copy string ASM vs C
Post by: TightCoderEx on June 17, 2012, 01:34:20 AM: This thread is not intended to bash "C" or "C++", but rather a lot of times those using higher level languages question the logic of low level programming. This example is functionally equivalent to char *strcpy ( char *Dest, char *Src )

Code: [Select]
0: 55 push rbp 1: 48 89 e5 mov rbp,rsp 4: 57 push rdi 5: 56 push rsi 6: 8b 75 18 mov esi,DWORD PTR [Src] 9: 8b 7d 10 mov edi,DWORD PTR [Dest] c: eb 01 jmp f <L0> e: aa stos BYTE PTR es:[rdi],al f: ac lods al,BYTE PTR ds:[rsi] 10: 08 c0 or al,al 12: 75 fa jne e <L0-0x1> 14: 5e pop rsi 15: 5f pop rdi 16: c9 leave 17: 58 pop rax 18: c2 08 00 ret 0x8
= 27 bytes

It would be interesting to see if someone really proficient in "C" or "C++" could rival this function and how they do it.
Title: Re: Copy string ASM vs C
Post by: Keith Kanios on June 17, 2012, 04:17:30 AM: Quote from: TightCoderEx on June 17, 2012, 01:34:20 AM
It would be interesting to see if someone really proficient in "C" or "C++" could rival this function and how they do it.

In what respect. Size? Speed? Safety? Portability?
Title: Re: Copy string ASM vs C
Post by: TightCoderEx on June 17, 2012, 06:41:45 AM: Definitely not portability, although once it comes down to API's even "C" is less portable.

Safety, definitely not especially the way I code some things, like doing 8 and 16 bit comparisons on memory or register because I know what the value of the high order bits will always be.

Size and as a natural consequence maybe speed. I'm pretty sure there is a combination of coding style such as using a reference variable *Dest = 'B' versus Dest [3] = 'B' and compile options that produces the most efficient code size and/or speed.
Title: Re: Copy string ASM vs C
Post by: Arq on June 18, 2012, 12:07:43 AM: I read long time ago somewhere(intel manuals?) than stos/lods are relatively slow at speed compared with simply mov/inc esi/edi on newer processors. Indeed debugging a little I found than my C code do that.
Title: Re: Copy string ASM vs C
Post by: TightCoderEx on June 18, 2012, 12:48:57 AM: Similarly
Code: [Select]
enter 148, 0is more weighty than
Code: [Select]
push ebp mov ebp, rsp sub rsp, 148and slightly heavier than
Code: [Select]
add rsp, -148
I have to ask myself though, what was the rationale for Intel engineers to design such functionality into the processor. My guess would be ENTER is 1/3 the size of the conventional method in this example

Code: [Select]
0: c8 48 14 00 enter 0x1448,0x0 4: 55 push rbp 5: 48 89 e5 mov rbp,rsp 8: 48 81 ec 48 14 00 00 sub rsp,0x1448
and by that logic more efficient speed wise, at least in this example anyway
Title: Re: Copy string ASM vs C
Post by: Keith Kanios on June 18, 2012, 04:42:14 AM: Quote from: Arq on June 18, 2012, 12:07:43 AM
I read long time ago somewhere(intel manuals?) than stos/lods are relatively slow at speed compared with simply mov/inc esi/edi on newer processors. Indeed debugging a little I found than my C code do that.

The underlying hardware design deviated from the instruction set quite some time ego. Modern x86 is a superscalar architecture. You have to factor in pipelines, microcode, instruction reordering, register renaming, caches, etc.

I am not at all surprised if stos/lods are slower, despite the ability of microcode to even things out. I wouldn't be surprised if compilers, i.e. favoring more generic and RISC-like code, are driving such hardware evolution.

However, while shooting for loop optimization is indeed important, as an assembly language programmer, I'd be more concerned over larger optimization gains that can be had across the entire architecture. Being conscientious of the fact that code and data caches can greatly impact performance seems more relevant. Is something like "rep movsb" the be-all-end-all to data copying? No, but it sure is compact (code cache) and causes predictable (linear and thus easily optimized) data cache access. I am more than content to leave micro optimizations to compilers that care about such wild goose chases ;)
Title: Re: Copy string ASM vs C
Post by: Keith Kanios on June 18, 2012, 04:57:56 AM: Quote from: TightCoderEx on June 18, 2012, 12:48:57 AM
I have to ask myself though, what was the rationale for Intel engineers to design such functionality into the processor. My guess would be ENTER is 1/3 the size of the conventional method in this example

...

and by that logic more efficient speed wise, at least in this example anyway

At the time that ENTER/LEAVE were conceived, I believe things were measured in Kilobytes and perhaps Megabytes. At the same time, IIRC, ENTER would have taken longer (clock cycles) to perform, technically. Your classic speed-vs-size tradeoff.

Does it still matter now that "clock cycles" are more of an ambiguous and moving target? Well, ENTER has implicit dependencies while the "long" way has explicit dependencies, so I don't see any obvious (simplistic) pipeline optimizations to gain there. However, I would chalk it up akin to my last response, ENTER is another one of those CISC-y instructions that have fallen out of favor in the age of superscalar.
Title: Re: Copy string ASM vs C
Post by: TightCoderEx on June 18, 2012, 05:27:14 AM: Well this thread and others have steered me onto Agner Fog's material and I've just read the first 75 of 161 pages of Optimising for Assembly. Maybe I'll be able to answer my own question in the near future, but already have better insight into standards especially as they apply to calling conventions in the different platforms.

It will be an interesting exercise to see if I can optimise what little I've done already to be more compliant of calling conventions, as it applies to 64 bit Linux and size and speed sensitive.
Title: Re: Copy string ASM vs C
Post by: Frank Kotler on June 18, 2012, 11:22:14 AM: I really don't know a "vector path" instruction from Adam's off ox, but I'm told that "enter" is slow (compared to discrete instructions) because it's "vector path". Apparently "leave" is "direct path", so is smaller and no slower than discrete instructions. Perhaps if you use that mysterious second parameter to "enter" it catches up again?

I haven't read Agner Fog's optimization guide, but he's got a lot of interesting material. http://www.agner.org - great site!

Best,
Frank
Title: Re: Copy string ASM vs C
Post by: Rob Neff on June 19, 2012, 12:09:23 AM: I too also recommend reading Agner for excellent discussions on optimizations.
Title: Re: Copy string ASM vs C
Post by: gens on October 15, 2012, 09:45:43 PM: for AMD64 amd cpu's AMD (who'd guess) has put out optimization guides

http://developer.amd.com/Resources/documentation/guides/Pages/default.aspx
they say for memcpy mov's are fastest (i read strcpy is basically copying an array, so memory ;but idk)

id guess intel put out some too