On BNET: Dumb (but funny) career moves
BNET Business Network:
BNET
TechRepublic
ZDNet

June 6th, 2007

The death of single threaded development

Posted by George Ou @ 1:02 am

Categories: AMD, Desktop, Development, Hardware, Intel, Servers

Tags: Development, Death, George Ou

Applications use to get free performance boosts whenever the clock speed on the CPU and memory bus went up at an exponential rate.  For two decades, applications just magically doubled in speed every two years without any requirement to redesign the code but the era of the “free lunch” performance boosts is over.  That end started around 2003 when CPU makers hit the 3 GHz thermal wall in microprocessors, but some modest per-core gains - although nothing like the old days - were made since 2003 in execution optimization with the transition to newer CPU micro-architectures despite the fact that clock speeds are lower.  The Lion’s share of progress made within the microprocessor industry for the last two years is the shift to multi-core processors, first dual core and now quad core processors.  From this point forward we’re only going to see a multiplication of CPU cores with relatively fixed clock speeds mostly in the 2 to 3 GHz range and maybe eventually close to 4 GHz on some premium products.

The consequence of this seismic shift in microprocessor development means that traditional single threaded applications will no longer see any significant gains in performance let alone exponential gains.  That means a typical single-threaded application will probably not be that much faster 8 years from now even if there are 16 times the number of CPU cores.  That’s because even if we had 32-core CPUs, a single threaded application will only be able to leverage 1 of the 32 cores within that CPU while 31 cores sit idle.  Of course some people might be wondering if it would be better to just keep scaling the clock speeds of single core processors and have something like a 20 GHz processor.  Yes that would be the ideal solution but it simply can’t be done short of using insane amounts of power and exotic liquid nitrogen cooling systems.  The entire microprocessor industry was forced to shift to a process of putting more cores in to a CPU versus ramping up the clock speed.

The only way to scale applications to take advantage of all that extra processing power in the extra cores requires a fundamental shift in the way programs are written.  This new programming technique is called “multithreaded programming” or “parallel programming”.  Here are a few ways to tackle multithreaded programming:

  • Use multithread optimized libraries
  • Use multithreaded development APIs like OpenMP and pthreads
  • Use automated parallelization and vectorization compilers
  • Hand threading (manual threading)

Multithread optimized libraries:
One of the easiest ways to do multithreaded programming is to take advantage of multithread optimized libraries.  In the latest Intel compiler 10.0, you have Intel’s MKL (Math Kernel Library) and multimedia processing functions which are optimized to run on multi-core processors with concurrent threads.  The hard work was already done and the developer merely takes advantage of what was already written.  Since math and science functions and multimedia processing have some of the heaviest computation requirements, these libraries and functions are a huge boost to developers.

Multithreaded development APIs:
OpenMP is a multithreaded development API designed to make multi-core optimization easier than manual “hand threading”.  OpenMP allows the automation of multithreaded parallel processing on multi-core processors and sometimes it even scales better than hand threading.  Intel’s Director of Marketing James Reinders explained to me that one might see a 400 to 500 percent performance gains over a single-threaded application on an 8-core processor.  Considering the fact that 700% scaling is the theoretical maximum gain on an 8-core computer, 500% gains from automated multithreading is extremely tempting since it saves the programmer from having to manually chop up the workload among multiple CPU cores.

Automated parallelization and vectorization compilers:
The new parallelization optimizations in the latest Intel compiler 10.0 allow applications that haven’t been coded with any multithreading in mind to get small boosts for multi-core computers.  These typically take loops in programs and tries to divide up the test cases across multiple CPU cores.  This isn’t just limited to “for” or “do while” loops but also for more complex loop structures.  The typical gains made are usually modest single digit or low tens percentage gains.  While that isn’t a lot, it is essentially a free boost with a simple compiler switch on all existing code.  You can basically try it with and without and see if it makes a difference in your application without doing any modifications to the code.  These parallelization and vectorization techniques have gotten a lot of press lately but they don’t even come close to replacing OpenMP or hand threading.

Hand threading (manual threading):
Hand threading is a manual process where the developer decides exactly how to break up a workload across multiple CPU cores and it can scale perfectly when done right.  With enough time and skill at one’s disposal, manual hand threading performance should always beat OpenMP performance but the skills needed for multithreaded programming are a very rare commodity.  The demand for skilled multithread programmers is huge and it isn’t something your run of the mill programmer can do.  For more on parallel programming, here’s a great article by Herb Sutter and James Larus.

What scales and what hasn’t:
The most obvious example of perfect multi-core scaling are 3D rendering and multimedia encoding applications all of which require a lot of processing time and have the most to gain.  Server applications also tend to scale fairly well because by their very nature they have a lot of concurrent and independent tasks to handle which can be divided up across multiple CPU cores.

The difficulty lies in getting games to scale well on multi-core processors.  Office productivity applications are another category of applications that generally don’t scale well either because there’s very little developer experience dealing with multi-CPU computers on the desktop platform.  Furthermore, you only have a single user generating the workload and that’s a lot harder to chop up than the server environment where you can just assign different user sessions to different CPU cores.  Office productivity performance is also less of an issue since you can’t possibly need that much more performance for mundane office tasks until computers start requiring more human-friendly interfaces.  Voice dictation for example would be one area where you can have one CPU core doing the actual dictation and the other core handing the rest of the workload on the PC.

George Ou is Technical Director of ZDNet. See his full profile and disclosure of his industry affiliations.

  • Talkback
  • Most Recent of 54 Talkback(s)
RE: The death of single threaded development
The thing about OpenMP and other efforts to scale hot "for" loops is they have limited applicability. There is a high priority need to build scalable applications, not just scalable algorithm implemen... (Read the rest)
Posted by: jfalgout@... Posted on: 12/04/07 You are currently: a Guest | | Terms of Use
Look on the bright side, George.  Zogg | 06/06/07
Very true, good point. NT  georgeou | 06/06/07
Bad developers?  Mikael_z | 06/07/07
Don't be silly  georgeou | 06/07/07
Crappy support as usual I think.  Mikael_z | 06/08/07
MS software runs fast by comparison  georgeou | 06/08/07
MS Office is built into the Win OS.  Mikael_z | 06/09/07
MS Office works better running inside VMware on Linux  georgeou | 06/10/07
Bloat is bloat.  B.O.F.H. | 06/09/07
MS Office with more features is still much faster than Open Office  georgeou | 06/10/07
Yawn  Yagotta B. Kidding | 06/06/07
Thanks for that article George.  BillyG_n_SC | 06/06/07
It will catch on where performance is needed  georgeou | 06/06/07
Well written  dragosani | 06/06/07
I've asked about that  georgeou | 06/06/07
Games can parallelize fine  CobraA1 | 06/06/07
I was pointing out the success rate of current games  georgeou | 06/06/07
There's a lot more factors in a game  CobraA1 | 06/07/07
In a sense, already being done.  linux for me | 06/06/07
So far no games scale perfectly yet, or even close to it  georgeou | 06/06/07
No argument there....  linux for me | 06/06/07
It will take years for the change to happen  No_Ax_to_Grind | 06/06/07
I've just looked around...  Zogg | 06/06/07
You didn't look hard enough then  Linux User 147560 | 06/06/07
History error  Yagotta B. Kidding | 06/06/07
Thanks for the correction...  Linux User 147560 | 06/06/07
Both of you are correct in a way  nucrash | 06/06/07
No, I *physcially* looked around.  Zogg | 06/06/07
Huh? Ever heard of AMD? (nt)  No_Ax_to_Grind | 06/06/07
What about AMD????  Zogg | 06/06/07
About 70% of their current offerings  Linux User 147560 | 06/06/07
Almost all new CPUs are x64 from Intel and AMD  georgeou | 06/06/07
The difference is  Linux User 147560 | 06/06/07
You mean the difference is "marketing"  georgeou | 06/06/07
So the conversation *has* changed from underneath me wink  Zogg | 06/06/07
There are applications out there to answer your  Linux User 147560 | 06/06/07
Intel has more a lot more x64 chips than AMD  georgeou | 06/06/07
It doesn't make sense not to advertise  Linux User 147560 | 06/07/07
Office productivity apps  filker0 | 06/06/07
One minor quibble about clock speeds  Letophoro | 06/06/07
IBM's Power6 is 5 GHz or more  georgeou | 06/06/07
Good news for Computer Programmers  nucrash | 06/06/07
Ah Y2K... the disaster  Linux User 147560 | 06/06/07
Well much was done to stop Y2K  nucrash | 06/06/07
Though I am technically inclined  Linux User 147560 | 06/06/07
Though I am technically inclined  Linux User 147560 | 06/06/07
My point is that the era of exponential growth for clock ended in 2003  georgeou | 06/06/07
Power6  andrnils | 06/07/07
Yeah and it's called Itanium. Then AMD came along and forced Intel to  georgeou | 06/07/07
Overly optimistic  Mark Miller | 06/08/07
groomy future!  joemartn | 06/06/07
Have application really been getting faster with CPUs?  mark.hill.smt | 06/07/07
If you avoided Java and XML, then yes  georgeou | 06/07/07
RE: The death of single threaded development  jfalgout@... | 12/04/07

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

Click Here
advertisement

Recent Entries

Top Rated

    advertisement

    Archives

    ZDNet Blogs

    White Papers, Webcasts, and Downloads

    Enterprise Applications

    • Check out some of the easiest and most powerful ways to boost productivity while saving money on your application infrastructure. See ZDNet's comprehensive Enterprise Application resource center, now!
    • New Online Dashboard
    • Read about top issues IT decision-makers face every day, plus get cost effective solutions to real life IT problems. Oracle Topline