Category: Memory
February 20th, 2007
Tera-Scale: What Would We Do with All These Cores and How Would We Feed Them?
Last week’s Tera-scale announcement at the International Solid State Circuits Conference (ISSCC) certainly created a lot of buzz in the press and on the Web. I have to admit being somewhat surprised by how extensively the story was picked up, not just in the technical press, but the popular press as well. From the many interviews I did, it was quite clear that people have an insatiable desire to know what their future computing devices will do and how soon they will do it. Fortunately, researchers at Intel and elsewhere have spent several years, not just thinking about the question, but actually building prototypes of those next-decade applications. Believe me when I say it’s much more credible to talk about a specific example than just blow some smoke and promise that whatever those applications are, they will be really cool.
Back to Recognition, Mining, and Synthesis
I first addressed the issue of why now is the time to create these ideas in my post Cool Codes in which I introduced the RMS categories. The important point is there is an entirely new breed of applications waiting to be invented that doesn’t simply benefit from Tera-scale performance, it requires it. Let me refresh you on RMS by talking about real-time motion capture and rendering and a few other examples to illustrate the idea.
Today, to produce a Pixar-quality image takes about 6 hours of computing on a current-generation, dual-processor rack-mount server. That's to render one frame out of the 144,000 frames required for a feature-length, animated movie. How cool would it be if you could bring that quality of image rendering to your desktop in real-time? Imagine playing the Cars video game with imagery that's comparable to what you see in the theater. To create that user experience, we have to go from 6 hours per frame to 124th of a second per frame, but at least it’s a very well-characterized computational improvement. It will take a combination of teraFLOPS of computing power and huge advances in the algorithms that render the image. Note that synthesis is the “S” in RMS, and this is but one example.
By the way, synthesis is not just about making pictures. It's making sounds, making things move and interact with one another in physically accurate ways. When an animated character speaks in these future desktop animations, their facial muscles will move exactly as they do when a real person speaks. It does beg the question whether we’ll actually need actors at some point, but that’s a topic for another blog.
Here’s another example: Today in our labs we can data mine the imagery found in a recorded multi-camera video of an individual moving within a defined 3D space. The goal of this video stream mining is to extract their full body motion. We can’t quite do it in real-time at this point, but we are pretty close and there’s no need for marks or lights on the clothing or a background blue screen to do it. By the way, mining is the M in RMS.
Once we have the body motion information, we use it to animate a skeletal model of a human. It’s the skeletal model that makes sure we have the kinematics right and the motion is consistent with how people move. At that point, we can put the “skin on the bones” to create a fully synthetic person moving identically to the real one. Adding lights, shadows, and reflections to our little virtual world gives us a synthetic figure moving naturally and accurately within it.
If you started to think how the above technology could replace the Wii handheld remote controllers, you’ve got the idea. Future video entertainment will use full-body motion capture to put your virtual self in the game, dance instruction, or Tai Chi lesson.
Take out the Noise, Take out the Shake
Most of us have cassettes full of VHS quality (or worse) home video. When we put it up on our new 50-inch HD displays, it simply looks awful. Adding video cameras to cell phones has further exacerbated the problem. Fortunately, there is a way to rescue these old videos. The technique is called super-resolution and it takes advantage of the tremendous amount of redundancy in a video stream. Using statistical techniques, we can dramatically reduce camera shake, improve resolution, and fix a variety of other visual problems by exploiting all the extra information provided by each frame. Imagine being able to bring all your cell phone videos up to standard definition quality and reprocess those “obsolete” DVDs into high-definition DVDs. It’s a Tera-scale problem for sure, and the reconnaissance satellite folks have been doing it for years. It’s time to make it safe for home use.
How Is It Possible to Feed Such a Beast?
Silent E was right in pointing out that memory capacity and bandwidth have to match or the cores will “starve” and users will not see the performance benefits. It’s relatively easy to pack a lot of processing power on a single chip. It’s much, much harder to provision the memory and I/O bandwidth to keep those processors productive. Fortunately, there are several approaches which promise to meet the future needs. Let me briefly mention two of them.
First, we need to bring more memory closer to the processors, and three approaches do this with varying degrees in bandwidth and capacity. The first is to use system-in-package (SIP) technology to place memory chips in the same package as the processor. Microsoft uses this approach in the Xbox 360. The next approach is to stack a memory chip underneath the processor, which is what we have planned as a future experiment with the Tera-scale Research Processor. Finally, there is embedding DRAM on the processor, as IBM described last week at ISSCC. Much work is required to decide which approach is best in a given situation, but the point is there is more than one solution.
Getting data on and off the chip is also a challenge. While we continue to push electrical signaling to higher and higher speeds, optical signaling is an increasingly attractive option. Costs are coming down and may decline even further when we move to silicon-based photonic solutions. If we can approach electrical costs, but still provide the flexibility and interference advantages of optical, we might just go optical. Once you make that transition, things look good out to about 10 terabits per second per fiber, which should keep us going for a little while to say the least.
Tera-scale keeps sounding more and more fun. Stay tuned as I continue to paint to complete picture. The blog is long overdue for a discussion of the programming challenges ahead.
December 4th, 2006
Mind the Gap
The last few months have been hectic to say the least. After the Intel Developer Forum in late September, I’ve been flying around the planet more or less non-stop. When I was in Europe and Russia last month, before heading to Japan and China just before Thanksgiving, a familiar phrase from the London Underground reminded me of a topic that I’ve wanted to blog about for some time – namely, closing the main memory – bulk storage latency gap that has plagued computer architecture for the last four decades.
Mind the Gap
At the fast end of the memory hierarchy, excluding on-chip caches, we have low-latency DRAM memory, but at over $100 per gigabyte, a lot of PCs still ship with half that amount. While a gig of DRAM may seem like a lot to those of us who can remember when the PDP-8 had only 4KB of core memory and a paper tape reader, a gigabyte is not nearly enough to hold my Outlook archive folders, a high-def movie, or a desktop search index file. I’m sure you share that frustrating feeling when you see your hard-drive light turn on and stay on as an application launches or more data gets paged into memory from disk.
Moving out one level in the hierarchy, magnetic disk has been the bulk storage technology of choice for decades. While disk continues to grow in capacity with relatively fixed cost, those capacity improvements have not been matched with similar reductions in random access latency. Over the past 10 years alone, processor performance has increased by over 30X while measured hard-drive performance has increased by only 1.3X. And, the gap will continue to grow as processor performance scaling moves to the new multi-core trajectory.
To put a finer point on it, we’ve had to make do with a factor of 100,000 difference between DRAM and HDD performance (random read latency of 150 nanoseconds vs. 15 milliseconds) and about two orders of magnitude in cost per bit for equivalent capacity. The trade-off between main memory and hard disk performance and cost affects system design and software design in fundamental and profound ways.
Coping with the Gap
Minding the gap means application developers must constantly manage the placement of data. They need to anticipate huge latency hits that can occur in a seemingly random fashion when a desired datum is not in memory. And, they need to anticipate the different target system configurations which will have a direct bearing on how the user perceives application performance.
OS developers have struggled with the gap for decades and have had some modest success hiding it. Virtual memory was invented to relieve application developers of the hassle of managing overlays, but it is very easy to push the notion of virtual memory too far. Push the ratio of virtual to physical memory too high and “a thrashing we will go” as paging rates turn exponential. The tendency of virtual memory systems to exhibit such poor performance when configured with too little DRAM has given rise to the belief that, “virtual memory is a great idea as long as you never use it.” Fortunately, Moore’s law has made doubling the DRAM in the system the usually affordable fix when the disk activity light never seems to go out.
It should come as no surprise that the search for a “gap filling” memory technology has been going for decades. When I joined Intel three decades ago, we explored (at some considerable expense) magnetic bubble memory and later charge-coupled device (CCD) memory. Neither turned out to either be dense enough and cheap enough to replace rotating magnetic storage. Many other technologies (e.g. holographic memory and, more recently, polymer memory) have been heralded over the years as being the long sought “gap filler” that will be a bit slower, but much cheaper than DRAM. Unfortunately, none of these widely-trumpeted devices panned out.
And the Winner Is?
What is a surprise is that a relatively unheralded technology, NAND flash memory, the same stuff you find in your digital music player or digital camera, looks like it may be the long-sought “gap filler” even though most people had given up looking. There are two approaches to bringing NAND into the memory hierarchy: so-called NAND disks and platform NAND, where the flash memory is integrated onto the motherboard. Let me leave the NAND disk approach for another blog while I focus on platform NAND for this posting. [Note: I have to slightly violate my promise not to tout future Intel products in this blog, but I’ll try to keep my enthusiasm, which is substantial, well in check.] Platform NAND currently goes by the code name Robson Technology at Intel and is slated for introduction with the next-generation mobile platform, codenamed Santa Rosa, in the first half of 2007.
In its initial configurations, Robson consists of up to 1 GB of NAND flash memory and an intelligent controller that fit either on a PCIe mini-card or directly down on the motherboard. In its Robson configuration, the NAND memory is used as a disk cache to temporarily store both applications and data. Since NAND has latency characteristics in the range of tens of microseconds and is non-volatile (maintains the memory image even when power is removed), it enables near “instant” resume from hibernation and applications launch 2X faster on average on Windows Vista. We also see lower overall platform energy consumption as the hard-drive spins up less frequently. The fact that NAND is typically 7X cheaper than DRAM doesn’t hurt either and makes Robson an excellent technology for filling the gap. Note that I say Robson and not NAND, because using plain NAND flash isn’t good enough to do the job on its own.
Overcoming the Weakness of NAND Flash
The one big issue with NAND as a gap filler is write endurance: NAND flash only supports a limited number of erasure cycles before wearing out. That’s where Robson’s smart controller comes into play. Simply put, it uses clever write-leveling algorithms to spread the block erasures evenly across the array giving the NAND flash memory a service life consistent with the rest of the platform.
The use of NAND as a disk cache is just the start of a major overhaul of the memory hierarchy. Samsung recently announced notebooks that use NAND to create a solid-state drive, completely eliminating the hard-drive. Further out in time, Intel and others are exploring technologies, such as phase-change memory (PCM), as a replacement for NAND flash. It’s too early to tell if PCM will go the way of magnetic bubble memory or if it will replace NAND flash, but the race is on for the future of non-volatile solid-state memory.
In the not too distant future, we can expect to see magnetic disk drives relegated to the role that tape drives play today, and even DIMMs may vanish from future motherboards. I’ll say more about that in another blog.
These changes will require us to rethink software architecture and implementation, including tuning of the operating system, drivers and applications. But the benefits are so tangible that the course is set and now the work must get done.
Going forward let’s not just mind the memory / storage gap –it’s time to close the gap for good.
Justin Rattner is an Intel Senior Fellow and director of Intel's Corporate Technology Group. He also serves as the corporation's chief technology officer. The opinions expressed in this blog are his own and not those of his employer.
SponsoredWhite Papers, Webcasts, and Downloads
- Reducing Server Total Cost of Ownership with VMware Virtualization Software VMware VMware virtualization enables customers to reduce their server TCO and ... Download Now
- Three Steps You Need to Know to Stop Data Loss Varonis Sensitive data exposed to misuse or loss... it is the stuff of nightmares ... Download Now
- Email Security and Archiving - Clearer in the Cloud Google The time is NOW for businesses and organizations of all sizes to implement ... Download Now
Recent Entries
- Tera-Scale: What Would We Do with All These Cores and How Would We Feed Them?
- 80 isn’t nearly enough
- Polaris Points the Way to Terascale Computing
- Mind the Gap
- Cool Codes
Blogs From Our Sponsors
Top Rated
Premier Vendor Content Whitepapers, webcasts & resources from our Power Center Sponsors
- The best support in the Linux business
-
If Linux is going to power your mission-critical applications, you'd better have the best support known to business. Novell was rated the top provider of Linux technical support.
- Learn more >>
- Save time with automated shipping solutions
-
The Business Essentials Guide provides you useful tools and templates to help grow your business and save you time with automated shipping solutions.
- Visit the UPS Business Essentials Guide
- The best support in the Linux business
-
If Linux is going to power your mission-critical applications, you'd better have the best support known to business. Novell was rated the top provider of Linux technical support.
- Learn more >>
- Reduce risk. Reduce complexity. Increase reliability.
-
A simplified IT environment isn't just less complex. It's also more reliable. Standardize on a single Linux platform with SUSE Linux Enterprise from Novell, and get the world's most interoperable Linux
- Learn more >>
Archives
ZDNet Blogs
- All About Microsoft
- The Apple Core
- Between the Lines
- BriefingsDirect
- Collaboration 2.0
- Dev Connection
- Digital Cameras & Camcorders
- Ed Bott's Microsoft Report
- Emerging Tech
- Enterprise Web 2.0
- Forrester Research
- Googling Google
- GreenTech Pastures
- Hardware 2.0
- Home Theater
- iGeneration
- Irregular Enterprise
- IT Project Failures
- Laptops & Desktops
- Lawgarithms
- Linux and Open Source
- Managing L'unix
- The Mobile Gadgeteer
- On Sustainability
- Rational Rants
- The Semantic Web
- Service Oriented
- Smartphones and Cell Phones
- Social Business
- Social CRM: The Conversation
- Software & Services Safari
- Software as Services
- Storage Bits
- Team Think
- Tech Broiler
- Technology and the Global Supply Chain
- Tom Foremski: IMHO
- The ToyBox
- Virtually Speaking
- The Web Life
- ZDNet Education
- ZDNet Government
- ZDNet Healthcare
- Zero Day
White Papers, Webcasts, and Downloads
- Three Steps You Need to Know to Stop Data Loss Varonis Sensitive data exposed to misuse or loss... it is the stuff of nightmares ... Download Now
- Virtualization: Architectural Considerations And Other Evaluation Criteria VMware Of the many approaches to x86 systems virtualization available in the ... Download Now
- Why Isn't Server Virtualization Saving Us More? A Few Small Changes May Dramatically Increase Your Efficiency VMware Companies have rapidly adopted server virtualization over the past few ... Download Now
Enterprise Applications
- Check out some of the easiest and most powerful ways to boost productivity while saving money on your application infrastructure. See ZDNet's comprehensive Enterprise Application resource center, now!
- New Online Dashboard
- Read about top issues IT decision-makers face every day, plus get cost effective solutions to real life IT problems. Oracle Topline







