On The Insider: Mew Moon Pulls $140.7 Million
BNET Business Network:
BNET
TechRepublic
ZDNet

Category: 80-core

February 20th, 2007

Tera-Scale: What Would We Do with All These Cores and How Would We Feed Them?

Posted by Justin Rattner @ 12:56 pm

Categories: 80-core, General, Hardware, Memory, Software

Tags:

Last week’s Tera-scale announcement at the International Solid State Circuits Conference (ISSCC) certainly created a lot of buzz in the press and on the Web. I have to admit being somewhat surprised by how extensively the story was picked up, not just in the technical press, but the popular press as well. From the many interviews I did, it was quite clear that people have an insatiable desire to know what their future computing devices will do and how soon they will do it. Fortunately, researchers at Intel and elsewhere have spent several years, not just thinking about the question, but actually building prototypes of those next-decade applications. Believe me when I say it’s much more credible to talk about a specific example than just blow some smoke and promise that whatever those applications are, they will be really cool.

Back to Recognition, Mining, and Synthesis

I first addressed the issue of why now is the time to create these ideas in my post Cool Codes in which I introduced the RMS categories. The important point is there is an entirely new breed of applications waiting to be invented that doesn’t simply benefit from Tera-scale performance, it requires it. Let me refresh you on RMS by talking about real-time motion capture and rendering and a few other examples to illustrate the idea.

Today, to produce a Pixar-quality image takes about 6 hours of computing on a current-generation, dual-processor rack-mount server. That's to render one frame out of the 144,000 frames required for a feature-length, animated movie. How cool would it be if you could bring that quality of image rendering to your desktop in real-time? Imagine playing the Cars video game with imagery that's comparable to what you see in the theater. To create that user experience, we have to go from 6 hours per frame to 124th of a second per frame, but at least it’s a very well-characterized computational improvement. It will take a combination of teraFLOPS of computing power and huge advances in the algorithms that render the image. Note that synthesis is the “S” in RMS, and this is but one example.

By the way, synthesis is not just about making pictures.  It's making sounds, making things move and interact with one another in physically accurate ways. When an animated character speaks in these future desktop animations, their facial muscles will move exactly as they do when a real person speaks. It does beg the question whether we’ll actually need actors at some point, but that’s a topic for another blog.

Here’s another example: Today in our labs we can data mine the imagery found in a recorded multi-camera video of an individual moving within a defined 3D space. The goal of this video stream mining is to extract their full body motion. We can’t quite do it in real-time at this point, but we are pretty close and there’s no need for marks or lights on the clothing or a background blue screen to do it. By the way, mining is the M in RMS.

Once we have the body motion information, we use it to animate a skeletal model of a human. It’s the skeletal model that makes sure we have the kinematics right and the motion is consistent with how people move. At that point, we can put the “skin on the bones” to create a fully synthetic person moving identically to the real one. Adding lights, shadows, and reflections to our little virtual world gives us a synthetic figure moving naturally and accurately within it.

If you started to think how the above technology could replace the Wii handheld remote controllers, you’ve got the idea. Future video entertainment will use full-body motion capture to put your virtual self in the game, dance instruction, or Tai Chi lesson.

Take out the Noise, Take out the Shake

Most of us have cassettes full of VHS quality (or worse) home video. When we put it up on our new 50-inch HD displays, it simply looks awful. Adding video cameras to cell phones has further exacerbated the problem. Fortunately, there is a way to rescue these old videos. The technique is called super-resolution and it takes advantage of the tremendous amount of redundancy in a video stream.  Using statistical techniques, we can dramatically reduce camera shake, improve resolution, and fix a variety of other visual problems by exploiting all the extra information provided by each frame. Imagine being able to bring all your cell phone videos up to standard definition quality and reprocess those “obsolete” DVDs into high-definition DVDs. It’s a Tera-scale problem for sure, and the reconnaissance satellite folks have been doing it for years. It’s time to make it safe for home use.

How Is It Possible to Feed Such a Beast?

Silent E was right in pointing out that memory capacity and bandwidth have to match or the cores will “starve” and users will not see the performance benefits. It’s relatively easy to pack a lot of processing power on a single chip. It’s much, much harder to provision the memory and I/O bandwidth to keep those processors productive. Fortunately, there are several approaches which promise to meet the future needs. Let me briefly mention two of them.

First, we need to bring more memory closer to the processors, and three approaches do this with varying degrees in bandwidth and capacity. The first is to use system-in-package (SIP) technology to place memory chips in the same package as the processor. Microsoft uses this approach in the Xbox 360. The next approach is to stack a memory chip underneath the processor, which is what we have planned as a future experiment with the Tera-scale Research Processor. Finally, there is embedding DRAM on the processor, as IBM described last week at ISSCC. Much work is required to decide which approach is best in a given situation, but the point is there is more than one solution.

Getting data on and off the chip is also a challenge. While we continue to push electrical signaling to higher and higher speeds, optical signaling is an increasingly attractive option. Costs are coming down and may decline even further when we move to silicon-based photonic solutions. If we can approach electrical costs, but still provide the flexibility and interference advantages of optical, we might just go optical. Once you make that transition, things look good out to about 10 terabits per second per fiber, which should keep us going for a little while to say the least.

Tera-scale keeps sounding more and more fun. Stay tuned as I continue to paint to complete picture. The blog is long overdue for a discussion of the programming challenges ahead.

February 12th, 2007

80 isn't nearly enough

Posted by Justin Rattner @ 11:27 am

Categories: 80-core, General, Hardware, Supercomputing

Tags:

What an exciting week this has been. We unleashed the ‘Era of Tera’ by showcasing the world’s first programmable processor that can deliver Teraflops performance with remarkable energy efficiency.

It’s rather extraordinary that after decades of single core processors, the high volume processor industry has gone from single to dual to quad-core in just the last two years. Moore’s Law scaling should easily let us hit the 80-core mark in a mainstream processors within the next ten years and quite possibly even less. It is therefore reasonable to ask the question: what are we going to do with this sudden abundance of processors?

The answer is somewhat obvious on the server side of things. More cores and more threads means more transactions per unit time, assuming that all those cores are given the necessary appropriate memory and I/O bandwidth. Other computationally intensive applications in scientific and engineering computing are also likely beneficiaries. I’m talking about seismic analysis, crash simulation, molecular modeling, genetic research, and fluid dynamics.

On the client end of the wire, things aren’t as obvious or straightforward, but they are no less interesting. The abundance of cores is likely to lead to a very different approach to resource allocation. For decades operating systems have been optimized for managing the very scarce processor resources, by cleverly multiplexing many tasks or threads across one or now two or four cores. As quality of service has become more important to users, we’ve all come to realize the limitations of this approach as frames get dropped from video streams or productivity applications pause while the video goes full tilt. A different approach, and one that probably hasn’t received enough attention from the research community, is to dedicate cores to providing particular functions. The allocations become more static than what we see today, but they can certainly be changed over longer periods of time ranging for seconds to hours or even days.

As an example, we could conceive of a multi-function computing appliance that contains a processor with perhaps three dozen cores: we might allocate four of those cores to running the core productivity and collaboration applications. Another cluster of cores, on the order of a dozen, might provide very high quality graphics and visualization. Media processing, beyond encode/decode which would best be handled by dedicated hardware, would be the responsibility of yet another cluster of, say six cores. Still other clusters might be do real-time data mining on various streams of data flowing in from the Internet. Various bots operating within this cluster might be assembling news, shopping, or investing. The key idea here is to let the abundant hardware resources replace a lot of very complex OS code. It’s replaced by cluster or partition management code, which doles out the resources, but stays out of the way until there’s a major shift in the workload.

TJGeezer suggested using Tera-Scale capability along with huge amounts of NAND in an iPOD size container for AI applications. He may be right. One can easily imagine clusters of cores supporting an advanced human interface with real-time speech and vision or language translation. A lot of algorithmic development would have to take place to make this feasible, but there is no doubt in my mind that we’ll have the hardware resources needed to host them. The statistical algorithms that will form the heart of these future recognition systems are highly parallel and thus a great fit for a high core count architecture.

An abundance of cores also enables new ways to deal with challenges associated with system operation in the face of device failures and cosmic radiation. Think of the collection of cores as a redundant array of computing engines (RACE). Two or more cores could be used in tandem to detect and correct faults. If a core becomes unreliable, it can simply be removed from service without significantly affecting overall system performance

As we pack more and more computing resources into smaller areas, managing power and heat in a very fine grain manner will be critical. If we have more cores than are needed to execute the desired set of workloads, we can swap threads between cores whenever one becomes too hot. It’s like the hot potato game – move the potato fast enough and you never get burned. We’ll need the ability to adjust supply voltages, operating frequencies, and sleep states of individual cores in matters of microseconds.

While the challenges are somewhat mind-boggling on both the hardware and software sides to develop and fully utilize these future Tera-Scale platforms, the benefits and opportunities from putting these computing capabilities into the hands of all users are equally incredible.

So how many cores could you use, and what would you use them for? ArsTechnica user dg65536 said it best in his post – “Now that I think about it…80 isn't nearly enough.”

Justin Rattner is an Intel Senior Fellow and director of Intel's Corporate Technology Group. He also serves as the corporation's chief technology officer. The opinions expressed in this blog are his own and not those of his employer.

SponsoredWhite Papers, Webcasts, and Downloads

advertisement
Click Here

Recent Entries

Top Rated

    advertisement

    Archives

    ZDNet Blogs

    White Papers, Webcasts, and Downloads

    Meet Doc