Thursday, July 28, 2016

IPC redux

Anandtech brings us an article today celebrating the 10th anniversary of Intel's Core 2 Duo, a processor line that brought a return to Intel's technological dominance over AMD. I've already looked at processor growth over the years so it's not really new to me. It's a good article though and Ian's articles are some of the best (although Anandtech without Anand Lal Shimpi is like Apple without Steve Jobs).

His dataset for graphs makes the apparent case that there's been a real and steady progression in performance when there really hasn't.  He's very much aware of the IPC stagnation over the past decade so it's mainly just a confusing choice of samples.

"Here's this new processor and here's this 10 year old one and look at all these others that fall in between"

But I figured I'd revisit the topic to clarify my own thinking.

Here's a table that I believe expresses the state of CPU advancement more accurately. The data reflect how processors perform on an unoptimized version of one of the oldest benchmarks around, Dhrystone. The Dhrystone program is tiny, about 16KB, and basically eliminates the trickery effects of large caches and other subsystems; it's meant to measure pure CPU integer performance which is the basis of most programs.

Maybe I've posted this table before. I can't be bothered to check and halt this stream of consciousness. Dhrystone data were gathered from Roy Longbottom's site

YearIPC D2.1 NoOptNoOpt IPC chgYearlyMHzClock changeHWBOT OCIPC * Clock change (OC)Yearly
1989AMD 3860.114040
1991i4860.1966%33%66180%112365%182%
1994Pentium0.2534%11%75108%233179%60%
1997Pentium II0.4580%27%300124%523304%101%
2000Pentium III0.473%1%1000206%1600214%71%
2002Pentium 40.14-70%-35%3066166%4250-19%-10%
2006Core 2 Duo0.52268%67%2400-9%3852234%58%
2009Core i7 9300.544%1%30669%421314%5%
20123930K0.51-5%-2%473011%46805%2%
20134820K0.520%0%3900-1%4615-1%-1%

"IPC D2.1 NoOpt" is a relative measure of how many instructions per cycle each processor is able to complete. Although there are optimized versions of the Dhrystone benchmark for specific processors, it's important to avoid those in a comparison looking purely at CPU performance. Optimizing for benchmarks used to be common practice and definitely gives a cheating kind of vibe.

Looking at the IPC, there were enormous gains in the eighties and nineties where each new generation could do significantly more work per cycle. A 286 to a 386 to a 486 were all clear upgrades even if the clock speed was the same (all had 25MHz versions). This pattern was broken with the Pentium III which was basically Intel just taking advantage of the public's expectation that it would outclass the previous generation. I know I was less than thrilled that many people enjoyed most of the performance of my pricey hot-rod Pentium III at a fraction of the cost with the legendary Celeron 300A. 

Pentium 4 went even further such that someone "upgrading" from a Pentium III 1GHz to a Pentium 4 1.3 GHz (P4 1 GHz did not exist) would be actually end up with a slower computer. And interestingly, the much vaunted Core i-series did not bring that much to the table over Core 2 in terms of IPC.

The Pentium III and Pentium 4 acquitted themselves with huge increases in clock speed which is the marquee feature for CPUs and it appears that the Core i-series followed the same pattern. But when you look at clock speeds that the large pool of overclockers average over at HWBOT, we see that Core 2 Duo was capable of some very good speeds. The HWBOT (air/water) scores are useful in finding out the upper limits to clock speed for a given architecture rather than the artificial limits Intel creates for marketing segmentation.

There are many people still using Core 2 Duos and for good reason. While it is certainly slower than the Core i-series, it's not too far off
... particularly if human perception of computing speed is logarithmic as it seems to be for many phenomena.
There are many reasons to go for newer architectures, e.g., lower power use, higher clockspeeds for locked CPUs, larger caches, additional instruction sets, Intel's lock-in philosophy etc. In retrospect, we can see the characteristic S-curve (or a plateauing if you are looking at the log graph) of technology development so it all makes sense - the Pentium 4 being a stray data point. An aberration. Something to dismiss in a post-study discussion or hide with a prominent line of best fit. There, so simple. Companies pay millions to frauds like Gartner Group for this kind of analysis! But you, singular, the reader, are getting it for free.

At the time, however, it seemed like the performance gains would go on forever with exotic technologies neatly taking taking the baton whereas the only technologies I see making a difference now - at least for CMOS - are caching related. Someday CPUs will fetch instructions like Kafka's Imperial messenger forever getting lost in another layer of cache. But if Zen turns to Zeno, it won't matter, Dhrystone will expose it all.

Kurzweil talks about S-curves being subsets of other S-curves like a fractal of snakes and maybe that's where we're at. Or is his mind the product of the 1970-2000 period of Moore's Law, forever wired to think in exponential terms?




No comments:

Post a Comment

Note: Only a member of this blog may post a comment.