Category: Software
February 20th, 2007
Tera-Scale: What Would We Do with All These Cores and How Would We Feed Them?
Last week’s Tera-scale announcement at the International Solid State Circuits Conference (ISSCC) certainly created a lot of buzz in the press and on the Web. I have to admit being somewhat surprised by how extensively the story was picked up, not just in the technical press, but the popular press as well. From the many interviews I did, it was quite clear that people have an insatiable desire to know what their future computing devices will do and how soon they will do it. Fortunately, researchers at Intel and elsewhere have spent several years, not just thinking about the question, but actually building prototypes of those next-decade applications. Believe me when I say it’s much more credible to talk about a specific example than just blow some smoke and promise that whatever those applications are, they will be really cool.
Back to Recognition, Mining, and Synthesis
I first addressed the issue of why now is the time to create these ideas in my post Cool Codes in which I introduced the RMS categories. The important point is there is an entirely new breed of applications waiting to be invented that doesn’t simply benefit from Tera-scale performance, it requires it. Let me refresh you on RMS by talking about real-time motion capture and rendering and a few other examples to illustrate the idea.
Today, to produce a Pixar-quality image takes about 6 hours of computing on a current-generation, dual-processor rack-mount server. That's to render one frame out of the 144,000 frames required for a feature-length, animated movie. How cool would it be if you could bring that quality of image rendering to your desktop in real-time? Imagine playing the Cars video game with imagery that's comparable to what you see in the theater. To create that user experience, we have to go from 6 hours per frame to 124th of a second per frame, but at least it’s a very well-characterized computational improvement. It will take a combination of teraFLOPS of computing power and huge advances in the algorithms that render the image. Note that synthesis is the “S” in RMS, and this is but one example.
By the way, synthesis is not just about making pictures. It's making sounds, making things move and interact with one another in physically accurate ways. When an animated character speaks in these future desktop animations, their facial muscles will move exactly as they do when a real person speaks. It does beg the question whether we’ll actually need actors at some point, but that’s a topic for another blog.
Here’s another example: Today in our labs we can data mine the imagery found in a recorded multi-camera video of an individual moving within a defined 3D space. The goal of this video stream mining is to extract their full body motion. We can’t quite do it in real-time at this point, but we are pretty close and there’s no need for marks or lights on the clothing or a background blue screen to do it. By the way, mining is the M in RMS.
Once we have the body motion information, we use it to animate a skeletal model of a human. It’s the skeletal model that makes sure we have the kinematics right and the motion is consistent with how people move. At that point, we can put the “skin on the bones” to create a fully synthetic person moving identically to the real one. Adding lights, shadows, and reflections to our little virtual world gives us a synthetic figure moving naturally and accurately within it.
If you started to think how the above technology could replace the Wii handheld remote controllers, you’ve got the idea. Future video entertainment will use full-body motion capture to put your virtual self in the game, dance instruction, or Tai Chi lesson.
Take out the Noise, Take out the Shake
Most of us have cassettes full of VHS quality (or worse) home video. When we put it up on our new 50-inch HD displays, it simply looks awful. Adding video cameras to cell phones has further exacerbated the problem. Fortunately, there is a way to rescue these old videos. The technique is called super-resolution and it takes advantage of the tremendous amount of redundancy in a video stream. Using statistical techniques, we can dramatically reduce camera shake, improve resolution, and fix a variety of other visual problems by exploiting all the extra information provided by each frame. Imagine being able to bring all your cell phone videos up to standard definition quality and reprocess those “obsolete” DVDs into high-definition DVDs. It’s a Tera-scale problem for sure, and the reconnaissance satellite folks have been doing it for years. It’s time to make it safe for home use.
How Is It Possible to Feed Such a Beast?
Silent E was right in pointing out that memory capacity and bandwidth have to match or the cores will “starve” and users will not see the performance benefits. It’s relatively easy to pack a lot of processing power on a single chip. It’s much, much harder to provision the memory and I/O bandwidth to keep those processors productive. Fortunately, there are several approaches which promise to meet the future needs. Let me briefly mention two of them.
First, we need to bring more memory closer to the processors, and three approaches do this with varying degrees in bandwidth and capacity. The first is to use system-in-package (SIP) technology to place memory chips in the same package as the processor. Microsoft uses this approach in the Xbox 360. The next approach is to stack a memory chip underneath the processor, which is what we have planned as a future experiment with the Tera-scale Research Processor. Finally, there is embedding DRAM on the processor, as IBM described last week at ISSCC. Much work is required to decide which approach is best in a given situation, but the point is there is more than one solution.
Getting data on and off the chip is also a challenge. While we continue to push electrical signaling to higher and higher speeds, optical signaling is an increasingly attractive option. Costs are coming down and may decline even further when we move to silicon-based photonic solutions. If we can approach electrical costs, but still provide the flexibility and interference advantages of optical, we might just go optical. Once you make that transition, things look good out to about 10 terabits per second per fiber, which should keep us going for a little while to say the least.
Tera-scale keeps sounding more and more fun. Stay tuned as I continue to paint to complete picture. The blog is long overdue for a discussion of the programming challenges ahead.
September 1st, 2006
Cool Codes
Monday of last week was one of those “convergence” days. I’m sure you know the feeling. Besides being my 29th wedding anniversary, it was the first day of the Hot Chips conference at Stanford University. Before my wife and I drove out to Half Moon Bay to celebrate, I was on stage at Mem Aud to give the opening keynote of the conference with a talk entitled Cool Codes for Hot Chips and to announce a new multi-core applications initiative. I’ll come back to the latter item in a moment.
The theme of my keynote was very much related to the question I raised in my last post – have we reached the end of applications or are we at the start of a new wave of innovation? Even though many of your comments had assumed I was in the opposite camp, I firmly believe that we are sitting on a plateau just waiting for the next order-of-magnitude leap in computer (and communication) performance and capability to unleash a new age of application innovation.
To get off this application plateau we have to have access to some radically better hardware. Unfortunately, the hardware won’t happen unless the architects (and their bosses) believe there will be software to take advantage of the new hardware. To resolve this chicken-and-egg question, we need to start building and testing working prototypes of these future applications. That’s what we’ve been doing at Intel for the last three years, and I took the opportunity at Hot Chips to call for a community wide-effort along the same lines.
A collection of future applications, ones that take today’s systems beyond their limits would serve two purposes. First, it would help stimulate much more thinking about what can be and should be done. More programmers would pick up the challenge and start thinking more expansively about the future. Second, it would give architects and engineers a set of working, prototype applications against which to evaluate the efficiency and programmability of their new designs.
Let me share one of the demos that I used at Hot Chips as an example of what’s possible if one has the necessary processing power.
Here’s the basic recipe (click on an image to see the video in action):
-
Take input from four cameras located in the corners of a room (Fig. 1a)
-
Analyze the video streams to extract the location and motion over time of the individual body parts (torso, arms, legs and head) based on a programmed skeletal model
-
Animate a synthetic human figure with skin using ray-tracing and global illumination within a virtual scene based on the actual kinematics determined in step 2 (Fig. 1b)
While live-action movie animations usually sprinkle LEDs over actors wearing dark clothes and then just track the bright lights, the Intel system works without any special markers on the person. You literally walk into the camera-equipped room and it just works.
The applications for this technology are wide open beyond the obvious ones in game play: you might compare your golf swing to that of Tiger Woods or see how you look walking or even dancing in a new outfit without ever putting it on. Given the model has your physical information, you’d know if you need the next size up or if the color isn’t quite right given your skin tone.
This system is appealing to us not because Intel is planning to ship one of these applications, but because it points to a broad new class of algorithms that we refer to as “recognition, mining and synthesis” or RMS.
The recognition stage answers the question “what is it?” – modeling of the body in our prototype system. Mining answers the question “where is it?” – analyzing the video streams to find similar instances of the model. And synthesis answers the question “how is it?” – creating a new instance of the model in some virtual world.
This flow between recognition, mining and synthesis applies beyond the entertainment and visual domains. It works equally well in domains as diverse as medicine, finance, and astrophysics.
Such emerging “killer apps” of the future have a few important attributes in common – they are highly parallel in nature, they are built from a common set of algorithms, and they have, by today’s standards, extreme computational and memory bandwidth requirements, often requiring teraFLOPS of computing power and terabytes per second of memory bandwidth, respectively. Unfortunately the R&D community is lacking a suite of these emerging, highly-scalable workloads in order to guide the quantitative design of our future computing systems.
The Intel RMS suite I mentioned earlier is based on a mix of internally-developed codes, such as the body tracking and animation prototype, and partner developed codes from some of the brightest minds in the industry and academia. As researchers outside of Intel learned more about the suite, they started to ask if we could make it publicly available. Since it contains a mix of Intel and non-Intel code, we couldn’t just place it in open source. A conversation last spring about the suite with my good friend Professor Kai Li of Princeton gave rise to the idea of a new publicly available suite, and my Hot Chips keynote gave me the opportunity to engage the technical community in its development.
At the end of the keynote I announced the creation of a publicly available suite of killer codes for future multi-core architecture research. I also announced that Intel would contribute some of our internally-developed codes in body-tracking and real-time ray tracing to launch the effort. I was also pleased to announce that Professor Ron Fedkiw at Stanford will contribute his physics codes, the University of Pittsburgh Medical Center will add their medical image analysis codes, Professor David Patterson at UC Berkeley will provide codes of the “Seven+ Dwarfs of Parallel Computing”, and Professors Li and JP Singh at Princeton will make additional network and I/O intensive contributions including content-based multimedia search, network traffic processing, and databases.
Professors Li and Singh have graciously offered to manage contributions to the suite and host the repository. A workshop is being arranged for early next year to establish some guideline on contributions. I’ll provide more information here as the date gets closer.
And that brings me back to the question of when will we have the computational capability to break free from today’s rather quaint applications? Sooner than most people think if we come together to create the future.
August 10th, 2006
The end of applications?
Sometimes someone says something at a conference that really knocks me for a loop. Such was the case at the High Performance Computer Architecture Conference last year. In typical panel fashion, a group of us were each given a few minutes to state our position on the future of computer architecture.
The panelist were chosen to represent a broad spectrum of architectural views from the traditional (x86) to the more radical (Cell) along with a software viewpoint. …it becomes harder and harder for developers to build, let alone imagine, applications with dramatically new capabilities. The hardware panelists more or less stuck to their respective party lines, but the software speaker said something that I won’t soon forget, “Since all of the interesting applications have been written, why is that you guys are still inventing new architectures? What IT managers want now is just lower cost hardware and easier to manage systems. That’s what you should be working on!”
Now I like a provocative panelist as much as anyone, but I just couldn’t swallow the line about the end of applications. I’m squarely in the camp that believes that the truly compelling computer applications have yet to be built.
At first I put the applications comment under the same heading as other famously wrong-headed thoughts about computing such as “only six electronic digital computers would be required to satisfy the computing needs of the entire United States” (Howard Aiken) and “there is no reason anyone would want a computer in their home” (Ken Olsen). The more I thought about it,
Justin Rattner is an Intel Senior Fellow and director of Intel's Corporate Technology Group. He also serves as the corporation's chief technology officer. The opinions expressed in this blog are his own and not those of his employer.
SponsoredWhite Papers, Webcasts, and Downloads
- Three Steps You Need to Know to Stop Data Loss Varonis Sensitive data exposed to misuse or loss... it is the stuff of nightmares ... Download Now
- The Impact of Virtualization Software on Operating Environments VMware Today's use of virtualization technology allows IT professionals to ... Download Now
- Five Steps to Determine When to Virtualize YourServers VMware Server virtualization isn't just for big companies. Entry-level ... Download Now
Recent Entries
- Tera-Scale: What Would We Do with All These Cores and How Would We Feed Them?
- 80 isn’t nearly enough
- Polaris Points the Way to Terascale Computing
- Mind the Gap
- Cool Codes
Blogs From Our Sponsors
Top Rated
Premier Vendor Content Whitepapers, webcasts & resources from our Power Center Sponsors
- Save time with automated shipping solutions
-
The Business Essentials Guide provides you useful tools and templates to help grow your business and save you time with automated shipping solutions.
- Visit the UPS Business Essentials Guide
- Reduce risk. Reduce complexity. Increase reliability.
-
A simplified IT environment isn't just less complex. It's also more reliable. Standardize on a single Linux platform with SUSE Linux Enterprise from Novell, and get the world's most interoperable Linux

- Learn more >>
- Microsoft Dynamics CRM Online - Free Six-Month Trial for Eligible Organizations
-
Microsoft Dynamics CRM Online provides fast online access, simple contact management and better sales performance for a low monthly cost - the best value on the market today.

- Learn more about the free, six-month trial offer>>
- The best support in the Linux business
-
If Linux is going to power your mission-critical applications, you'd better have the best support known to business. Novell was rated the top provider of Linux technical support.

- Learn more >>
Archives
ZDNet Blogs
- All About Microsoft
- The Apple Core
- Between the Lines
- BriefingsDirect
- Collaboration 2.0
- Dev Connection
- Digital Cameras & Camcorders
- Ed Bott's Microsoft Report
- Emerging Tech
- Enterprise Web 2.0
- Forrester Research
- Googling Google
- GreenTech Pastures
- Hardware 2.0
- Home Theater
- iGeneration
- Irregular Enterprise
- IT Project Failures
- Laptops & Desktops
- Lawgarithms
- Linux and Open Source
- Managing L'unix
- The Mobile Gadgeteer
- On Sustainability
- Rational Rants
- The Semantic Web
- Service Oriented
- Smartphones and Cell Phones
- Social Business
- Social CRM: The Conversation
- Software & Services Safari
- Software as Services
- Storage Bits
- Team Think
- Tech Broiler
- Technology and the Global Supply Chain
- Tom Foremski: IMHO
- The ToyBox
- Virtually Speaking
- The Web Life
- ZDNet Education
- ZDNet Government
- ZDNet Healthcare
- Zero Day
White Papers, Webcasts, and Downloads
- Five Steps to Determine When to Virtualize YourServers VMware Server virtualization isn't just for big companies. Entry-level ... Download Now
- Why Isn't Server Virtualization Saving Us More? A Few Small Changes May Dramatically Increase Your Efficiency VMware Companies have rapidly adopted server virtualization over the past few ... Download Now
- VMware Infrastructure: A Guide to Bottom-Line Benefits VMware Frustrated by the costs of maintain ever larger data centers?or building ... Download Now
Enterprise Applications
- Check out some of the easiest and most powerful ways to boost productivity while saving money on your application infrastructure. See ZDNet's comprehensive Enterprise Application resource center, now!
- New Online Dashboard
- Read about top issues IT decision-makers face every day, plus get cost effective solutions to real life IT problems. Oracle Topline







