Friday, March 31, 2017

Moore's Law: An Originalist Perspective

The reports of the death of Moore's Law have been greatly exaggerated. At least that's what Intel will have hoped you believed after its March 28th industry advertorial blitz Technology and Manufacturing Day. There are many formulations for Moore's Law but here it is as stated by Gordon Moore himself:
The number of transistors and resistors on a chip doubles every 24 months
It's a little tricky because you can always add more cores or even fluff like a GPU or fixed function accelerators to increase transistor counts even if yields degrade and programs suited for using these resources are rare. The law doesn't make an exception for these, or more pertinently, multicore designs, so let's see how the biggest chips from Intel compare over the past 20 years. I'll do GPU transistor counts as well to see how that compares. I expect that doubling transistor counts every two years is probably true, but for Nvidia rather than Intel.

An hour of googling later:

Man I hate wccftech so much. How such a garbage rumor site constantly gets top billing on Google searches is a mystery. It's almost a reverse hierarchy with rumor trash like wccftech and Motley fool on top, data-dredged (but still useful) SEO clickbait like CPUBoss after, followed by amateur enthusiast sites like Anandtech and Tom's Hardware, then professional sites like NextPlatform, and finally expert but accessible sites like David Kanter of Microprocessor Report Linley Group's RealWorldTech buried underneath.

Anyway here are the tables.* First GPU:

DateGPU NameTransistor Count - millions% change over previousExpected
April 1997Riva 1283.53.5
March 1999Riva TNT215328.57%7.00
February 2001GeForce 363320.00%14.00
January 2003GeForce FX5800 Ultra12598.41%28.00
June 2005GeForce 7800GTX302141.60%56.00
May 2007GeForce 8800 Ultra681125.50%112.00
January 2009GeForce GTX 2851400105.58%224.00
May 2011GeForce GTX 5803000114.29%448.00
May 2013GeForce GTX Titan7100136.67%896.00
May 2015GeForce GTX Titan X1200069.01%1,792.00
1H 2017?Titan Volta ?
Wow! GPU workloads are more parallel but it's clear Nvidia is outpacing Moore's Law by a comfortable margin. Not quite Kurzweil "accelerating change" fast but Nvidia could stagnate for four years and still be ahead. I'd say good job Nvidia but they've already taken a ton of my money which is all they really ever wanted anyway.

Now CPU:

DateCPU NameTransistor Count - millions% change over previousExpectedCore Count
May 1997Pentium II Klamath7.57.51
February 1999Pentium III Katmai9.526.67%151
April 2001Pentium 442342.11%301
March 2003Pentium M Banias7783.33%601
May 2005Pentium D Smithfield230198.70%1201
April 2007Core 2 Kentsfield MCM586154.78%2402
March 2009Xeon Gainestown (w5580)75027.99%4802
April 2011Xeon Westmere (E7-8870)2600246.67%96010
June 2013Xeon Ivy Bridge (E5-2697 v2)289011.15%192012
May 2015Xeon Haswell (E7-8890 v3)560093.77%384018
1H 2017?Xeon Kaby ?

Well there it is, Intel really has been able to keep Moore's Law on track! Granted, the samples from Westmere onward aren't consumer chips but rather pricey server parts with lots of cores, but they aren't one-off tech demos either. 

Moore's Law doesn't say anything about price or performance either which is why it technically doesn't matter that the E7-8890v3 was released at $7,200 whereas the Pentium II was $1,200 in 2017 Dollars.** Or that the 8890v3 is an 18 core chip that will be no faster than a one or two core version for most programs.

Anyway, I stand corrected. Or do I? Intel's answer to whether Moore's Law is dead was this chart:

The cost per transistor proves nothing except to show that even Intel isn't immune from imputing different intents to Moore's Law i.e., "What does a biannual doubling of transistor counts per chip mean?" 

For many years, increasing transistor counts meant a direct increase in performance which is why there was a big difference between a 286, 386, 486, etc. Those gains are gone. Intel's presentation points out how they are able to pack more transistors per area than ever before and how cost per transistor has gone way down thanks to their $10 Billion fabs - although you wouldn't realize it given their SKU pricing over the past decade.***

Intel's cost savings haven't translated to customer savings although the incidental benefits of lower power consumption and better frequencies for those lower power parts have. At the consumer and high end, the lack of real competition and price gouging isn't a big secret. Thankfully, Zen will bring sanity back to the market so Intel will probably have to cut this slide of their 60%+ Gross Margins at their next Technology and Manufacturing Day.

But the attack of Naples and Ryzen won't bring back the old days of huge IPC increases. Maybe it brings large price drops across the board where the $7,200 5.6 Billion transistor E7 gets priced closer to the $800 12 Billion transistor 1080Ti. Big whoop.

The single thread performance leader has never been the chip with huge core counts. And there are still architectural advantages that are unique to AMD, Intel, and IBM that could provide cumulative improvements to future chips using existing technology. Then there are not-yet-existing technologies that promise the world. Popular Science/IEEE Spectrum sort of "breakthroughs" that you stopped paying attention to years ago because it was beyond vaporware. Plasmaware?

Photonics, neural networks, quantum computing. Maybe those really are the future but I'd rather see an effort into proven methods for improving performance. Extreme cooling. Optimize packages, processes, and IT infrastructure for superconducting temperatures. I'm just being Cray Cray.

* I didn't factor in paper launches, custom runs, or more exotic chips like Tesla and Phi. Currently, the Xeon Phi 7290 with 72 cores holds the record as Intel's biggest chip with over 8 billion transistors. On Nvidia's side is the 15.3 billion transistor P100. Both aren't that much larger than the "mainstream" parts.

** Back then the Pentium Pro was better thanks to faster cache despite being an older design. That sort of imprinted the importance of cache in me like a little baby duck. Caches used to be on a separate socket, then moved onto the CPU board on a separate chip, and then onto the CPU itself. But there's still a case to be made for caches physically separate but still closer than RAM as you have with eDRAM. For desktops where power consumption isn't important, it would be great to see a socketed cache maybe on the underside of the CPU package made of good ol' fashioned $RAM

Change the standard for high performance computers to use liquid cooling to allow RAM to be even closer. We're at the point where the highest memory frequencies are on boards where RAM channels and traces are a shorter distance versus comparable motherboards. 

*** One thing that would be really great to see adopted is Intel's definition of transistor density to differentiate processes. While I'm not sure about the weighting of 60% to NAND and 40% to Scan Flip Flop units, it's a whole lot better than the 14nm 10nm marketing that isn't fooling anyone. Actually I think just a raw maximum transistors per square mm and some sort of confidence interval for maximum frequencies achievable at various temperatures would be best.

Wow, all these cool ideas that I have no ability to implement. I'll design the logo.

Friday, March 24, 2017

Flash down, Javascript to go ...

I didn't notice when Google set Chrome to block Flash by default. It's been years since I've played a Flash game which, I suspect, is the only use I or anyone else has had for the technology in recent memory.

The ecosystem of Flash-based games was impressive in a kind of eighties and nineties way. Back then, it was more common for programs to be authored by a single person which made for some idiosyncratic experiences – different user interfaces, different methods to solving some problem, different methods for accessing and extending program functions. Windows and Flash homologated some of that but single-author programs and games still feel very different.

Apple famously blocked Flash on its iPhones citing security and performance concerns but the security issue was overblown. iOS, Windows, Linux, Android, Tor, LastPass, Antivirus – all compromised. But the performance issue was real and it was a miracle that Flash games ran at all.* Even Flash advertisements could bog down a browsing session which was a big incentive to use an adblocker.

Most website designers never really learned that pop-ups, pop-unders, autoplaying videos with sound, animated Flash ads, affiliate redirects, advertorial sections, and the like are never going to be accepted. Rather than redesign their sites to use unobtrusive locally hosted linked images the way Techpowerup does when encountering an adblock user, we're increasingly seeing this:

This behavior isn't new and I wrote a while ago about how to bypass some of these types of page elements.

My first response is to block Javascript using the little ⓘ symbol next to the URL before resorting to manually blocking page elements. Sometimes it's easiest to just hit escape or click if it's an obscure site you doubt you'll visit again but this particular method of interfering with site access is so pervasive that I've switched to globally blocking Javascript and whitelisting sites.

The main issue here is that Javascript is still required for many sites. Page formatting is usually more primitive as well for non-Javascript versions**

At some point, I can imagine website designers requiring Javascript at which point the next step will be to switch to FireFox and install AdNauseam or even switch to Opera.

* I remember that most Flash games would get bogged down quickly at later levels even on powerful hardware as more things were happening on the screen. Setting Flash to "low quality" didn't fix much either. And the games were very primitive. The equivalent DOS game would have run fluidly on a 50MHz machine but when given the Flash treatment would well over ten times the computing power.

** I did notice that Google's non-javascript search includes limited chronological and verbatim filtering directly on the results page which is maybe the only advantage over the regular search.