Author Topic: Why NOT to use "C" (Read 33934 times)

TightCoderEx · « **on:** July 07, 2012, 10:41:12 AM »

Every once in awhile I'll drill into a library to see how these highly optimized compilers work or at least what they produce. I am cognisant of monetary factors and as Agner Fog points out ASSEMBLY can be very error prone, but this just boggles the mind, how any compiler technology can bring this about

in libncurses.so.5.9 @ A5C0 <mvinsstr> I came across this

Code: [Select]

00  48 89 5C 24 F0        mov [rsp-0x10],rbx
05  48 89 6C 24 F8        mov [rsp-0x8],rbp
0A  48 83 EC 18           sub rsp, 0x18

Most of us at a quick glance will immediately see this is the same as

Code: [Select]

00  55           push    rbp
00  53           push    rbx

Are compilers the real deal? Well I think the proof is in the pudding and IMHO know your architecture, know your instruction set and do it in ASSEMBLY is the only way to get truly optimized code. The fact that I'm a long way from meeting these criteria doesn't negate the legitimacy of the facts.

There, I had my Sat morning rant, feel much better now.

Frank Kotler · « **Reply #1 on:** July 07, 2012, 08:11:55 PM »

In principle, I agree with you 100% (or more).

However... it has come to my attention that the guys on news:comp.arch differentiate between the "architecture" and the "hardware". "Architecture" being the instruction set that we have access to, and "hardware" being the "micro-code" which the "risc core" actually executes (and which is, AFAIK, "proprietary" - a "trade secret").

From this viewpoint, "push" is of course a "complex instruction". The "hardware" actually has to execute something roughly like:

Code: [Select]

sub rsp, 8
mov [rsp], rbp
sub rsp, 8
mov [rsp], rbx

One thing that really jumps out at me in the compiler's code is that the "sub" is done last. An "old timer" would never ever touch anything below sp. An interrupt could occur at any time, using the same sp, and could trash anything we put below sp. However, a modern architecture/hardware/OS (I'm not sure which actually determines this) uses a "different stack" for interrupts, so it is "safe" to use the area under esp/rsp - we can even use esp/rsp as a general purpose register if the need is dire (saving/restoring it somewhere - not on the stack, obviously). In terms of "dependencies", if the "sub" were put first the other instructions would need to wait for the new value of rsp to be known (the instruction "retired") before the "mov"s could even begin. We have multiple "execution units" even on a single core, and "out-of-order execution" to contend with!

I had occasion to test this - AMD K5 vs K6 I think, but don't hold me to that. I was accused of "pretending to be surprised" at the result. I wasn't surprised that "push" had become slower than "mov", but I was surprised by how much. As I recall, on the older hardware they were about the same - with a slight advantage to "push". On the newer hardware "push" took more than twice as long!

So as much as it pains me to say it, I think the compiler's code may be better than what you or I would (probably) write. There's a tradeoff to this. If we can get more "business" done per cache-line, cache, or page, we may be able to avoid the very slow operation of reloading any of these. I don't know where (if ever) the "win" occurs.

I pretty much gave up on optimizing for speed when I realized that they changed the rules with every generation of CPU. I recall reading (but not where) that "push" has been made fast again on very recent hardware. A compiler, in the hands of a competent author/maintainer, can keep up with these changes - and optimize for older hardware instead, if specified. When optimizing for size, it's easy to "keep score" with a simple "ls -l" or equivalent. My attempts to time code have been... inconclusive. I concentrate on "do everything that needs to be done, and nothing that doesn't need to be done". Works for me.

Don't get me wrong, I "like" asm better than any HLL and will probably never abandon it. But you don't get the advantages from it that you did in the "good old days". One thing I've learned is that change happens. Whether you like it or not (I frequently don't), you'd better learn to cope with it. One way to cope is to ignore any change you don't like, when and if possible.

Hope I didn't spoil your rant!

Best,
Frank

TightCoderEx · « **Reply #2 on:** July 08, 2012, 01:24:04 AM »

Well I had just spend 45 minutes formulating what I though a pretty intelligent response, only to lose it by navigating to another window in the same tab. The irrefutability of your responses lends argument mute, but all I had composed was an embellishment upon "Whether you like it or not". The fact remains, I/we can rant all we want, the machine will move in the direction it wants.

A rant now and then is good for blood pressure, and now I will recluse to that little world of mine, where the only OS is Linux and everybody programs with NASM.

Frank Kotler · « **Reply #3 on:** July 08, 2012, 02:21:07 AM »

See ya there!

(although I'm still in the 32-bit subset of that universe)

Best,
Frank

Rob Neff · « **Reply #4 on:** July 08, 2012, 05:53:22 PM »

To append to the current discussion: Modern day compilers have so many command line switches and intrinsics such that, in order to obtain the optimized performance desired, it requires a deep understanding of their effect on the code. It may be argued whether it is better to use the compiler architecture or assembly.

I've long given up on trying to keep pace with all the latest optimizations and timings. Similar to a Linux installation today compared to those of years ago, I rarely find the need to drill down to the nitty-gritty anymore. Maybe it's just my old age or a been-there done-that attitude.

TightCoderEx · « **Reply #5 on:** July 08, 2012, 07:00:14 PM »

Quote from: Rob Neff on July 08, 2012, 05:53:22 PM

it requires a deep understanding of their effect on the code.

Which that brings into question then, in order to do that, you need to understand every nuance of the instruction set and architecture, so why would one want to add another layer, learning the idiosyncrasies of a compiler.

I often wonder if it comes down to one fundamental premise and that is the need for homo sapien to control. This can only be done in one of two ways. Have knowledge and keep it secret or make is so complex, rational human beings don't want to deal with it. Case in point, out legal system. How ten simple commandments has turned into thousands of volumes and need years of study just to get an inkling of an idea, boggles the mind. Oh well, at least the wheel and lever seem to be pretty safe.

Frank Kotler · « **Reply #6 on:** July 08, 2012, 08:14:18 PM »

Quote

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.

- C.A.R. Hoare

I fear that we're tending towards the "easy way".

One consequence is that we really can't tell beginners to observe the compiler's assembly output to help learn asm. It used to be a good idea at one time, but now... the compiler's output may be fast, but it isn't easy to understand!

Maybe things were really "better in the good old days" or maybe I'm just getting grumpy... maybe both?

Best,
Frank

NASM - The Netwide Assembler

News:

Author Topic: Why NOT to use "C" (Read 33934 times)

TightCoderEx

Why NOT to use "C"

Frank Kotler

Re: Why NOT to use "C"

TightCoderEx

Re: Why NOT to use "C"

Frank Kotler

Re: Why NOT to use "C"

Rob Neff

Re: Why NOT to use "C"

TightCoderEx

Re: Why NOT to use "C"

Frank Kotler

Re: Why NOT to use "C"