On TechRepublic: FREE download: Social networking policy
BNET Business Network:
BNET
TechRepublic
ZDNet

February 20th, 2007

Tera-Scale: What Would We Do with All These Cores and How Would We Feed Them?

Posted by Justin Rattner @ 12:56 pm

Categories: 80-core, General, Hardware, Memory, Software

Tags:

Last week’s Tera-scale announcement at the International Solid State Circuits Conference (ISSCC) certainly created a lot of buzz in the press and on the Web. I have to admit being somewhat surprised by how extensively the story was picked up, not just in the technical press, but the popular press as well. From the many interviews I did, it was quite clear that people have an insatiable desire to know what their future computing devices will do and how soon they will do it. Fortunately, researchers at Intel and elsewhere have spent several years, not just thinking about the question, but actually building prototypes of those next-decade applications. Believe me when I say it’s much more credible to talk about a specific example than just blow some smoke and promise that whatever those applications are, they will be really cool.

Back to Recognition, Mining, and Synthesis

I first addressed the issue of why now is the time to create these ideas in my post Cool Codes in which I introduced the RMS categories. The important point is there is an entirely new breed of applications waiting to be invented that doesn’t simply benefit from Tera-scale performance, it requires it. Let me refresh you on RMS by talking about real-time motion capture and rendering and a few other examples to illustrate the idea.

Today, to produce a Pixar-quality image takes about 6 hours of computing on a current-generation, dual-processor rack-mount server. That's to render one frame out of the 144,000 frames required for a feature-length, animated movie. How cool would it be if you could bring that quality of image rendering to your desktop in real-time? Imagine playing the Cars video game with imagery that's comparable to what you see in the theater. To create that user experience, we have to go from 6 hours per frame to 124th of a second per frame, but at least it’s a very well-characterized computational improvement. It will take a combination of teraFLOPS of computing power and huge advances in the algorithms that render the image. Note that synthesis is the “S” in RMS, and this is but one example.

By the way, synthesis is not just about making pictures.  It's making sounds, making things move and interact with one another in physically accurate ways. When an animated character speaks in these future desktop animations, their facial muscles will move exactly as they do when a real person speaks. It does beg the question whether we’ll actually need actors at some point, but that’s a topic for another blog.

Here’s another example: Today in our labs we can data mine the imagery found in a recorded multi-camera video of an individual moving within a defined 3D space. The goal of this video stream mining is to extract their full body motion. We can’t quite do it in real-time at this point, but we are pretty close and there’s no need for marks or lights on the clothing or a background blue screen to do it. By the way, mining is the M in RMS.

Once we have the body motion information, we use it to animate a skeletal model of a human. It’s the skeletal model that makes sure we have the kinematics right and the motion is consistent with how people move. At that point, we can put the “skin on the bones” to create a fully synthetic person moving identically to the real one. Adding lights, shadows, and reflections to our little virtual world gives us a synthetic figure moving naturally and accurately within it.

If you started to think how the above technology could replace the Wii handheld remote controllers, you’ve got the idea. Future video entertainment will use full-body motion capture to put your virtual self in the game, dance instruction, or Tai Chi lesson.

Take out the Noise, Take out the Shake

Most of us have cassettes full of VHS quality (or worse) home video. When we put it up on our new 50-inch HD displays, it simply looks awful. Adding video cameras to cell phones has further exacerbated the problem. Fortunately, there is a way to rescue these old videos. The technique is called super-resolution and it takes advantage of the tremendous amount of redundancy in a video stream.  Using statistical techniques, we can dramatically reduce camera shake, improve resolution, and fix a variety of other visual problems by exploiting all the extra information provided by each frame. Imagine being able to bring all your cell phone videos up to standard definition quality and reprocess those “obsolete” DVDs into high-definition DVDs. It’s a Tera-scale problem for sure, and the reconnaissance satellite folks have been doing it for years. It’s time to make it safe for home use.

How Is It Possible to Feed Such a Beast?

Silent E was right in pointing out that memory capacity and bandwidth have to match or the cores will “starve” and users will not see the performance benefits. It’s relatively easy to pack a lot of processing power on a single chip. It’s much, much harder to provision the memory and I/O bandwidth to keep those processors productive. Fortunately, there are several approaches which promise to meet the future needs. Let me briefly mention two of them.

First, we need to bring more memory closer to the processors, and three approaches do this with varying degrees in bandwidth and capacity. The first is to use system-in-package (SIP) technology to place memory chips in the same package as the processor. Microsoft uses this approach in the Xbox 360. The next approach is to stack a memory chip underneath the processor, which is what we have planned as a future experiment with the Tera-scale Research Processor. Finally, there is embedding DRAM on the processor, as IBM described last week at ISSCC. Much work is required to decide which approach is best in a given situation, but the point is there is more than one solution.

Getting data on and off the chip is also a challenge. While we continue to push electrical signaling to higher and higher speeds, optical signaling is an increasingly attractive option. Costs are coming down and may decline even further when we move to silicon-based photonic solutions. If we can approach electrical costs, but still provide the flexibility and interference advantages of optical, we might just go optical. Once you make that transition, things look good out to about 10 terabits per second per fiber, which should keep us going for a little while to say the least.

Tera-scale keeps sounding more and more fun. Stay tuned as I continue to paint to complete picture. The blog is long overdue for a discussion of the programming challenges ahead.

February 12th, 2007

80 isn't nearly enough

Posted by Justin Rattner @ 11:27 am

Categories: 80-core, General, Hardware, Supercomputing

Tags:

What an exciting week this has been. We unleashed the ‘Era of Tera’ by showcasing the world’s first programmable processor that can deliver Teraflops performance with remarkable energy efficiency.

It’s rather extraordinary that after decades of single core processors, the high volume processor industry has gone from single to dual to quad-core in just the last two years. Moore’s Law scaling should easily let us hit the 80-core mark in a mainstream processors within the next ten years and quite possibly even less. It is therefore reasonable to ask the question: what are we going to do with this sudden abundance of processors?

The answer is somewhat obvious on the server side of things. More cores and more threads means more transactions per unit time, assuming that all those cores are given the necessary appropriate memory and I/O bandwidth. Other computationally intensive applications in scientific and engineering computing are also likely beneficiaries. I’m talking about seismic analysis, crash simulation, molecular modeling, genetic research, and fluid dynamics.

On the client end of the wire, things aren’t as obvious or straightforward, but they are no less interesting. The abundance of cores is likely to lead to a very different approach to resource allocation. For decades operating systems have been optimized for managing the very scarce processor resources, by cleverly multiplexing many tasks or threads across one or now two or four cores. As quality of service has become more important to users, we’ve all come to realize the limitations of this approach as frames get dropped from video streams or productivity applications pause while the video goes full tilt. A different approach, and one that probably hasn’t received enough attention from the research community, is to dedicate cores to providing particular functions. The allocations become more static than what we see today, but they can certainly be changed over longer periods of time ranging for seconds to hours or even days.

As an example, we could conceive of a multi-function computing appliance that contains a processor with perhaps three dozen cores: we might allocate four of those cores to running the core productivity and collaboration applications. Another cluster of cores, on the order of a dozen, might provide very high quality graphics and visualization. Media processing, beyond encode/decode which would best be handled by dedicated hardware, would be the responsibility of yet another cluster of, say six cores. Still other clusters might be do real-time data mining on various streams of data flowing in from the Internet. Various bots operating within this cluster might be assembling news, shopping, or investing. The key idea here is to let the abundant hardware resources replace a lot of very complex OS code. It’s replaced by cluster or partition management code, which doles out the resources, but stays out of the way until there’s a major shift in the workload.

TJGeezer suggested using Tera-Scale capability along with huge amounts of NAND in an iPOD size container for AI applications. He may be right. One can easily imagine clusters of cores supporting an advanced human interface with real-time speech and vision or language translation. A lot of algorithmic development would have to take place to make this feasible, but there is no doubt in my mind that we’ll have the hardware resources needed to host them. The statistical algorithms that will form the heart of these future recognition systems are highly parallel and thus a great fit for a high core count architecture.

An abundance of cores also enables new ways to deal with challenges associated with system operation in the face of device failures and cosmic radiation. Think of the collection of cores as a redundant array of computing engines (RACE). Two or more cores could be used in tandem to detect and correct faults. If a core becomes unreliable, it can simply be removed from service without significantly affecting overall system performance

As we pack more and more computing resources into smaller areas, managing power and heat in a very fine grain manner will be critical. If we have more cores than are needed to execute the desired set of workloads, we can swap threads between cores whenever one becomes too hot. It’s like the hot potato game – move the potato fast enough and you never get burned. We’ll need the ability to adjust supply voltages, operating frequencies, and sleep states of individual cores in matters of microseconds.

While the challenges are somewhat mind-boggling on both the hardware and software sides to develop and fully utilize these future Tera-Scale platforms, the benefits and opportunities from putting these computing capabilities into the hands of all users are equally incredible.

So how many cores could you use, and what would you use them for? ArsTechnica user dg65536 said it best in his post – “Now that I think about it…80 isn't nearly enough.”

December 18th, 2006

Polaris Points the Way to Terascale Computing

Posted by Justin Rattner @ 2:12 pm

Categories: General, Hardware

Tags:

Two months ago at the Intel Developer Forum, Intel’s CEO, Paul Otellini, unveiled a 300mm wafer that contained hundreds of massively multi-core prototype processors each consisting of 80 simple, but programmable floating-point cores. While it was an early wafer fresh from Intel’s Fab 24 in Ireland, it generated a lot of attention and discussion in the press. Numerous excellent points (both positive and negative) were raised – with most of the points centered on what would you do with this many cores and how one would program it. More on my thoughts to these points in a later blog, but today I wanted to give a status update on what we call the Polaris prototype.

Just two weeks ago we received the first packaged Polaris processors. Within the two hours of power-up, the very first chip in the test fixture reached 1.02 TFLOPS at 3.2 GHz while consuming less than 100W. The fact that we broke the TFLOPS barrier on A0 silicon is just amazing. It’s very special for me because it comes almost exactly a decade to the day after ASCI Red was the first system in the world to break that barrier – but consumed over 500 KW watts and 2500 square feet of computing space to do it.

While this 80-core system is still very much an experimental design (go to the International Solid State Circuits Conference, session 5, to get all the technical details), it does point the way to the near future when teraFLOPS capable designs will be commonplace. Just think – within the past two years the industry has gone from single to dual to quad-core – and by Moore’s Law extrapolation, we’ll hit the 80-core mark with production processors in less than ten years.

December 4th, 2006

Mind the Gap

Posted by Justin Rattner @ 1:01 pm

Categories: General, Hardware, Memory

Tags:

The last few months have been hectic to say the least. After the Intel Developer Forum in late September, I’ve been flying around the planet more or less non-stop. When I was in Europe and Russia last month, before heading to Japan and China just before Thanksgiving, a familiar phrase from the London Underground reminded me of a topic that I’ve wanted to blog about for some time – namely, closing the main memory – bulk storage latency gap that has plagued computer architecture for the last four decades.

Mind the Gap

At the fast end of the memory hierarchy, excluding on-chip caches, we have low-latency DRAM memory, but at over $100 per gigabyte, a lot of PCs still ship with half that amount. While a gig of DRAM may seem like a lot to those of us who can remember when the PDP-8 had only 4KB of core memory and a paper tape reader, a gigabyte is not nearly enough to hold my Outlook archive folders, a high-def movie, or a desktop search index file. I’m sure you share that frustrating feeling when you see your hard-drive light turn on and stay on as an application launches or more data gets paged into memory from disk.

Moving out one level in the hierarchy, magnetic disk has been the bulk storage technology of choice for decades. While disk continues to grow in capacity with relatively fixed cost, those capacity improvements have not been matched with similar reductions in random access latency. Over the past 10 years alone, processor performance has increased by over 30X while measured hard-drive performance has increased by only 1.3X.  And, the gap will continue to grow as processor performance scaling moves to the new multi-core trajectory.

To put a finer point on it, we’ve had to make do with a factor of 100,000 difference between DRAM and HDD performance (random read latency of 150 nanoseconds vs. 15 milliseconds) and about two orders of magnitude in cost per bit for equivalent capacity. The trade-off between main memory and hard disk performance and cost affects system design and software design in fundamental and profound ways.

Coping with the Gap

Minding the gap means application developers must constantly manage the placement of data. They need to anticipate huge latency hits that can occur in a seemingly random fashion when a desired datum is not in memory. And, they need to anticipate the different target system configurations which will have a direct bearing on how the user perceives application performance.

OS developers have struggled with the gap for decades and have had some modest success hiding it. Virtual memory was invented to relieve application developers of the hassle of managing overlays, but it is very easy to push the notion of virtual memory too far. Push the ratio of virtual to physical memory too high and “a thrashing we will go” as paging rates turn exponential. The tendency of virtual memory systems to exhibit such poor performance when configured with too little DRAM has given rise to the belief that, “virtual memory is a great idea as long as you never use it.” Fortunately, Moore’s law has made doubling the DRAM in the system the usually affordable fix when the disk activity light never seems to go out.

It should come as no surprise that the search for a “gap filling” memory technology has been going for decades. When I joined Intel three decades ago, we explored (at some considerable expense) magnetic bubble memory and later charge-coupled device (CCD) memory. Neither turned out to either be dense enough and cheap enough to replace rotating magnetic storage. Many other technologies (e.g. holographic memory and, more recently, polymer memory) have been heralded over the years as being the long sought “gap filler” that will be a bit slower, but much cheaper than DRAM. Unfortunately, none of these widely-trumpeted devices panned out.

And the Winner Is?

What is a surprise is that a relatively unheralded technology, NAND flash memory, the same stuff you find in your digital music player or digital camera, looks like it may be the long-sought “gap filler” even though most people had given up looking. There are two approaches to bringing NAND into the memory hierarchy: so-called NAND disks and platform NAND, where the flash memory is integrated onto the motherboard. Let me leave the NAND disk approach for another blog while I focus on platform NAND for this posting. [Note: I have to slightly violate my promise not to tout future Intel products in this blog, but I’ll try to keep my enthusiasm, which is substantial, well in check.] Platform NAND currently goes by the code name Robson Technology at Intel and is slated for introduction with the next-generation mobile platform, codenamed Santa Rosa, in the first half of 2007.

In its initial configurations, Robson consists of up to 1 GB of NAND flash memory and an intelligent controller that fit either on a PCIe mini-card or directly down on the motherboard. In its Robson configuration, the NAND memory is used as a disk cache to temporarily store both applications and data. Since NAND has latency characteristics in the range of tens of microseconds and is non-volatile (maintains the memory image even when power is removed), it enables near “instant” resume from hibernation and applications launch 2X faster on average on Windows Vista. We also see lower overall platform energy consumption as the hard-drive spins up less frequently. The fact that NAND is typically 7X cheaper than DRAM doesn’t hurt either and makes Robson an excellent technology for filling the gap. Note that I say Robson and not NAND, because using plain NAND flash isn’t good enough to do the job on its own.

Overcoming the Weakness of NAND Flash

The one big issue with NAND as a gap filler is write endurance: NAND flash only supports a limited number of erasure cycles before wearing out. That’s where Robson’s smart controller comes into play. Simply put, it uses clever write-leveling algorithms to spread the block erasures evenly across the array giving the NAND flash memory a service life consistent with the rest of the platform.

The use of NAND as a disk cache is just the start of a major overhaul of the memory hierarchy. Samsung recently announced notebooks that use NAND to create a solid-state drive, completely eliminating the hard-drive. Further out in time, Intel and others are exploring technologies, such as phase-change memory (PCM), as a replacement for NAND flash. It’s too early to tell if PCM will go the way of magnetic bubble memory or if it will replace NAND flash, but the race is on for the future of non-volatile solid-state memory.

In the not too distant future, we can expect to see magnetic disk drives relegated to the role that tape drives play today, and even DIMMs may vanish from future motherboards. I’ll say more about that in another blog.

These changes will require us to rethink software architecture and implementation, including tuning of the operating system, drivers and applications. But the benefits are so tangible that the course is set and now the work must get done.

Going forward let’s not just mind the memory / storage gap –it’s time to close the gap for good.

September 1st, 2006

Cool Codes

Posted by Justin Rattner @ 2:05 pm

Categories: General, Software

Tags:

Monday of last week was one of those “convergence” days. I’m sure you know the feeling. Besides being my 29th wedding anniversary, it was the first day of the Hot Chips conference at Stanford University. Before my wife and I drove out to Half Moon Bay to celebrate, I was on stage at Mem Aud to give the opening keynote of the conference with a talk entitled Cool Codes for Hot Chips and to announce a new multi-core applications initiative. I’ll come back to the latter item in a moment.

The theme of my keynote was very much related to the question I raised in my last post – have we reached the end of applications or are we at the start of a new wave of innovation? Even though many of your comments had assumed I was in the opposite camp, I firmly believe that we are sitting on a plateau just waiting for the next order-of-magnitude leap in computer (and communication) performance and capability to unleash a new age of application innovation.

To get off this application plateau we have to have access to some radically better hardware. Unfortunately, the hardware won’t happen unless the architects (and their bosses) believe there will be software to take advantage of the new hardware. To resolve this chicken-and-egg question, we need to start building and testing working prototypes of these future applications. That’s what we’ve been doing at Intel for the last three years, and I took the opportunity at Hot Chips to call for a community wide-effort along the same lines.

 A collection of future applications, ones that take today’s systems beyond their limits would serve two purposes. First, it would help stimulate much more thinking about what can be and should be done. More programmers would pick up the challenge and start thinking more expansively about the future. Second, it would give architects and engineers a set of working, prototype applications against which to evaluate the efficiency and programmability of their new designs.

Let me share one of the demos that I used at Hot Chips as an example of what’s possible if one has the necessary processing power.

Here’s the basic recipe (click on an image to see the video in action):

  1. Take input from four cameras located in the corners of a room (Fig. 1a)
  2. Analyze the video streams to extract the location and motion over time of the individual body parts (torso, arms, legs and head) based on a programmed skeletal model
  3. Animate a synthetic human figure with skin using ray-tracing and global illumination within a virtual scene based on the actual kinematics determined in step 2 (Fig. 1b)

Camera_Input_1a.pngFig. 1a

Body_Tracing_Output.pngFig. 1b

While live-action movie animations usually sprinkle LEDs over actors wearing dark clothes and then just track the bright lights, the Intel system works without any special markers on the person. You literally walk into the camera-equipped room and it just works.

The applications for this technology are wide open beyond the obvious ones in game play: you might compare your golf swing to that of Tiger Woods or see how you look walking or even dancing in a new outfit without ever putting it on. Given the model has your physical information, you’d know if you need the next size up or if the color isn’t quite right given your skin tone.

This system is appealing to us not because Intel is planning to ship one of these applications, but because it points to a broad new class of algorithms that we refer to as “recognition, mining and synthesis” or RMS.

The recognition stage answers the question “what is it?” – modeling of the body in our prototype system. Mining answers the question “where is it?” – analyzing the video streams to find similar instances of the model. And synthesis answers the question “how is it?” – creating a new instance of the model in some virtual world.

This flow between recognition, mining and synthesis applies beyond the entertainment and visual domains. It works equally well in domains as diverse as medicine, finance, and astrophysics.

Such emerging “killer apps” of the future have a few important attributes in common – they are highly parallel in nature, they are built from a common set of algorithms, and they have, by today’s standards, extreme computational and memory bandwidth requirements, often requiring teraFLOPS of computing power and terabytes per second of memory bandwidth, respectively. Unfortunately the R&D community is lacking a suite of these emerging, highly-scalable workloads in order to guide the quantitative design of our future computing systems.

The Intel RMS suite I mentioned earlier is based on a mix of internally-developed codes, such as the body tracking and animation prototype, and partner developed codes from some of the brightest minds in the industry and academia. As researchers outside of Intel learned more about the suite, they started to ask if we could make it publicly available. Since it contains a mix of Intel and non-Intel code, we couldn’t just place it in open source. A conversation last spring about the suite with my good friend Professor Kai Li of Princeton gave rise to the idea of a new publicly available suite, and my Hot Chips keynote gave me the opportunity to engage the technical community in its development.

At the end of the keynote I announced the creation of a publicly available suite of killer codes for future multi-core architecture research. I also announced that Intel would contribute some of our internally-developed codes in body-tracking and real-time ray tracing to launch the effort. I was also pleased to announce that Professor Ron Fedkiw at Stanford will contribute his physics codes, the University of Pittsburgh Medical Center will add their medical image analysis codes, Professor David Patterson at UC Berkeley will provide codes of the “Seven+ Dwarfs of Parallel Computing”, and Professors Li and JP Singh at Princeton will make additional network and I/O intensive contributions including content-based multimedia search, network traffic processing, and databases.

Professors Li and Singh have graciously offered to manage contributions to the suite and host the repository. A workshop is being arranged for early next year to establish some guideline on contributions. I’ll provide more information here as the date gets closer.

And that brings me back to the question of when will we have the computational capability to break free from today’s rather quaint applications? Sooner than most people think if we come together to create the future.

August 10th, 2006

The end of applications?

Posted by Justin Rattner @ 11:19 am

Categories: General, Software

Tags:

Sometimes someone says something at a conference that really knocks me for a loop. Such was the case at the High Performance Computer Architecture Conference last year. In typical panel fashion, a group of us were each given a few minutes to state our position on the future of computer architecture.

The panelist were chosen to represent a broad spectrum of architectural views from the traditional (x86) to the more radical (Cell) along with a software viewpoint. …it becomes harder and harder for developers to build, let alone imagine, applications with dramatically new capabilities. The hardware panelists more or less stuck to their respective party lines, but the software speaker said something that I won’t soon forget, “Since all of the interesting applications have been written, why is that you guys are still inventing new architectures? What IT managers want now is just lower cost hardware and easier to manage systems. That’s what you should be working on!”

Now I like a provocative panelist as much as anyone, but I just couldn’t swallow the line about the end of applications. I’m squarely in the camp that believes that the truly compelling computer applications have yet to be built.

At first I put the applications comment under the same heading as other famously wrong-headed thoughts about computing such as “only six electronic digital computers would be required to satisfy the computing needs of the entire United States” (Howard Aiken) and “there is no reason anyone would want a computer in their home” (Ken Olsen). The more I thought about it,

Read the rest of this entry »

July 7th, 2006

Your Cards and Letters

Posted by Justin Rattner @ 10:02 am

Categories: General

Tags:

First, let me thank everyone who took a moment to comment on my first entry. It looks like there is an audience for this kind of commentary, although not everyone was thrilled with me taking the longer view. I’ll do what I can to relate a bit more of what I say to the here and now.

Before I get to your cards and letters, let me highlight two of my week’s more interesting encounters. I was with Rodney Brooks of MIT CSAIL and iRobot fame on an innovation panel at an internal Intel conference in San Diego. Rodney described how fickle NASA had been with innovative ideas like using really cheap robots for planetary exploration. As long as they worked, the money flowed, but as soon as there was a mishap on one such mission, the idea lost a lot of support within the agency. His point was that truly innovative enterprises value failure as much as success, perhaps even more so because there is so much more to understand when you get it wrong. When you get it right, you never know how close you were to disaster.

Robotics turned out to be an unintended theme last week. I spent two fascinating hours with Professors Sebastian Thrun and Chris Gerdes at Stanford. They were two of the key faculty members on Stanley, the autonomous VW Touareg which won the DARPA Grand Challenge car race last fall. We were chatting about where driverless cars might go next and how soon they might get there. We took a break, and Chris let me drive one of his steer-by-wire prototype vehicles. What a trip, so to speak.

The steering wheel has no mechanical connection to the actual steering mechanism. There’s only a shaft encoder that tells the servomotors where to position the wheels. The steering mechanism uses an invention from the late 1950s called harmonic drive. Chris tried to explain the principal behind harmonic drive, but without a picture of it I was completely lost. If you’re interested and would like to learn more about HD of the mechanical kind, I recommend you visit Harmonic Drive LLC and watch the videos. Two of these harmonic drive motors, one for each wheel, do the actual steering.

As a BMW owner of long standing, I know what precise steering feels like, but steer-by-wire takes it to another level. The steering response is vernier-like; it’s simultaneously immediate and precise.

At the moment there is no force feedback to the steering wheel, but Chris plans to add it in the near future. Unlike purely mechanical steering, you’ll be able to adjust the amount of feedback to fit the driver. That would be cool.

Are Special Purpose Processors the Next New Thing?

There was a short thread on general purpose vs. special purpose computing elements as it relates to multi-core processors. It’s a great topic and one I plan to cover in some depth in an upcoming blog. If you can’t wait for that, I suggest you look at Myers’ and Sutherland’s paper from way back in 1967 entitled On the Design of Display Processors which describes the architect’s ride on something called The Wheel of Reincarnation. Whenever someone tells me about the great idea they have for a special-purpose processor, I tell them to read this paper and get back to me. It helps keep the lines short.

I’m optimistic that we will see the programming breakthroughs in the next few years that will make it much easier to write efficient, correct, and modular programs for multi-core processors

On the question of returning to single-threaded machines at some time in the future given some yet-to-be identified technology, I think it’s extremely unlikely. I wouldn’t rule out the possibility of some new logic device allowing for much higher speed operation at reasonable power levels, but there is more to single-thread performance than just clock speed. It’s a combination of clock speed and the number of instruction you can execute in parallel per clock cycle. A new logic device would have to be not only smaller and faster, but much more energy-efficient as well. Until such a device is discovered or invented and proves amenable to high volume manufacture, we will be living with multi-core processors.

Coming Soon to a Computer Near You

The good news is that necessity is the mother of invention, and I’m optimistic that we will see the programming breakthroughs in the next few years that will make it much easier to write efficient, correct, and modular programs for multi-core processors in the near future. I’ll talk about some of the work-in-progress at Intel and elsewhere that has me feeling so positive about the future of general-purpose parallel programming. I’m fine if some of you want to be in denial for the next few years as so many have been over global warming, but isn’t it better to ride the wave than get crushed by it?

I’d Take a Good Algorithm Any Day

On the research side of things at Intel, we spend as much time, if not more, looking at algorithmic improvements as we do at hardware improvements. Engineering a 10x improvement in hardware takes years, but a good programmer with the right tools can do it in a day. It’s another reason why I am so wary of special-purpose processors. A better algorithm on a general-purpose processor can run rings around a poorer algorithm on a special-purpose processor. Look for a future blog when I’ll explain how this is exactly what’s happened to today’s most successful special-purpose processor and why it’s time for general-purpose processors to make a stunning comeback. You won’t want to miss that one.

And that’s it until next time.

June 26th, 2006

The green flash

Posted by Justin Rattner @ 1:00 pm

Categories: General

Tags: Job, Industry, Blog, Intel Corp., Blogging, Strategy, Processors, Internet, Management, Semiconductors

When the prospect of doing a blog came across my screen, I turned to my most trusted advisor on such matters, one of my 19-year old twin boys. He’s a technology buff in his own right and a freshman in EECS at Oregon State University. I managed to get him to look up from his latest copy of Wired for a few moments to answer my question, “If I were to If I’m Intel’s CTO, why isn’t this blog on the Intel site? do a blog, what would you want me to write about?” His answer convinced me that I might have something to unique to say, “People always want to know what will be new and cool, not just next year or even the year after, but in five years or even ten years and that’s what you’re really good at explaining.” With my mission statement so quickly crafted, I agreed to accepted the blog offer, and this is installment one.

Unfortunately, you won’t find much of the long range stuff in this first entry. Before jumping in, I thought it best to get the start-up information out of the way and, hopefully, save you from asking me a bunch of obvious questions. That’s not to say I don’t look forward to answering your questions, because I do, but I’d rather keep them focused on the technical topics to be discussed in the upcoming entries rather than all the whys and wherefores of doing a blog for ZDNet.

First off, I hope to give you a unique vantage point to the future of information technology. Most of what you read in print and online is at least one extra hop from the people doing the work. Even then it is generally after the fact or at least after the prototype is working. My job will be to eliminate the middle man so to speak and get you closer to the leading or, as we often say, the bleeding edge. Since part of my responsibilities include directing a 1000-person research organization, a lot of what I hope to say will be real-time. We’ll go inside the labs and talk about the ideas and experiments now underway that will define the future of information and communication technology (ICT) in five to ten years.

Weaving technology and opinion

This leads to an obvious question: If I’m Intel’s CTO, why isn’t this blog on the Intel site? The simple answer is that Intel’s corporate site is not a place for personal opinion. It’s the place you go for information about new products or standards efforts. If that’s what you’re looking for, then please go there. What I plan to do here will be technical — but also rife with opinion and, hopefully, insight you’re not likely to get anywhere else.

Sure, when there is new and relevant Intel product technology, I’ll talk about it. It would be foolish not to. But my real focus will be on the ideas and innovations that will underlie information and communication systems of tomorrow. Similarly, I won’t be discussing the competitive environment or engaging in a lively debate over who has the best game platform. There are plenty of other web sites and blogs for having those discussions, so I won’t deal with them here.

Connecting the dots

One of the things I will try to do is seek out the technological inflection points, a term Intel’s legendary ex-CEO, Andy Grove, made a household phrase more than a decade ago. It’s often easy to miss them when you just look at the individual data points. My job will be to put them into context to see if they point to an impending inflection point in technology. Perhaps an historical example would be useful.

The move to multi-core processors caught a lot of people by surprise. I was shocked when I heard an industry visionary say he thought multi-core was just a short term work-around until we got the semiconductor processes back on track. He was literally expecting us to invent a new transistor that would be super fast and eliminate all the problems with power. Most importantly, from his point of view, it would let us get back to building screaming fast single-threaded processors. When he saw our 5-year processor roadmap had only multi-core processors from 2006 onward, he finally realized that the industry had reached a true inflection point. If I’m doing my job here, you won’t be similarly surprised in five years (more or less) when something equally dramatic occurs.

Another thing worth noting is that many of the data points will come from outside of Intel. As Bill Joy, another industry legend, once said, “Most of the smart people work for some other company,” It follows that most of the good ideas come from elsewhere, too. Since talking to technology leaders across industry and academia is part of my job description, I’ll do my best, without breaking confidentiality, to bring you their insights and observations. I’ll also to try to give you the global perspective of the trends and ideas shaping information and communication technology. As Intel’s network of research labs reaches across the U.S. to Europe, the Middle East, Asia, and now Mexico, I’ll share the views of the technology and policy leaders from around the world.

Let the dialog begin

I welcome your questions and comments, as well as your ideas for future topics. Feel free to point me to key breakthroughs, inflections or scenarios that I’ve missed. I’ll address the most interesting ones, and I’ll try to do so in a timely fashion.

Enough housekeeping. Now that you’ve seen the green flash, we can start to look over the horizon.

Justin Rattner is an Intel Senior Fellow and director of Intel's Corporate Technology Group. He also serves as the corporation's chief technology officer. The opinions expressed in this blog are his own and not those of his employer.

SponsoredWhite Papers, Webcasts, and Downloads

advertisement

Recent Entries

Top Rated

    Archives

    ZDNet Blogs

    White Papers, Webcasts, and Downloads