On CBS.com: Enter for Chance to Tour Hollywood Set
BNET Business Network:
BNET
TechRepublic
ZDNet

March 18th, 2009

Greenplum aims to eliminate massive data load 'choke points' with Scatter/Gather technology

Posted by Dana Gardner @ 12:15 pm

Categories: BI, Software Development, Software Infrastructure, business intelligence, datacenters, management

Tags: Data, Node, Greenplum, Storage, Databases, Hardware, Enterprise Software, Software, Data Management, Dana Gardner

Greenplum has taken massively parallel processing (MPP) of data to the next level with the introduction this week of its “MPP Scatter/Gather Streaming” (SG Streaming) technology, which manages the flow of data into all nodes of the database, eliminating the traditional bottlenecks with massive data loading.

The San Mateo, Calif. company, which provides large-scale analytics and data warehousing, says SG Streaming has allowed customers to achieve production-loading speeds of over four terabytes per hour with negligible impacts on concurrent database operations. [Disclosure: Greenplum is a sponsor of BriefingsDirect podcasts.]

Under the “parallel everywhere” approach to loading data flows from one or more source systems to every node of the database without any sequential choke points. This differs from traditional “bulk loading” technologies, used by most mainstream database and parallel-processing appliance vendors that push data from a single source, often over a single or small number of parallel channels, and result in fundamental bottlenecks and ever-increasing load times.

The new technology “scatters” data from all source systems across hundreds or thousands of parallel streams that simultaneously flow to all nodes of the database. Performance scales with the number of nodes, and the technology supports both large batch and continuous near-real-time loading patterns with negligible impact on concurrent database operations.

Data can be transformed and processed in-flight, utilizing all nodes of the database in parallel, for extremely high-performance extract-load-transform (ELT) and extract-transform-load-transform (ETLT) loading pipelines. Final ‘gathering’ and storage of data to disk takes place on all nodes simultaneously, with data automatically partitioned across nodes and optionally compressed.

It was just six months ago that Greenplum publicly unveiled how it wrapped MapReduce approaches into the newest version of its data solution. That advance allowed users to combine SQL queries and MapReduce programs into unified tasks executed in parallel across thousands of cores.

Dana GardnerDana Gardner is principal analyst of Interarbor Solutions. For disclosures on Dana's industry affiliations, click here or to view his full profile click here.

Email Dana Gardner

Subscribe to BriefingsDirect via Email alerts or RSS.


Link to BriefingsDirect podcast. Subscribe to the podcast Feed. Subscribe with iTunes.


Talkback

Add your opinion

SponsoredWhite Papers, Webcasts, and Downloads

advertisement
Click Here

Recent Entries

advertisement

Archives

Favorite Links

ZDNet Blogs

White Papers, Webcasts, and Downloads

Meet Doc

  • Here to help you with your Document Management Needs
  • Doc is an enigma. Born to a Russian ballerina and a German electrical engineer, he grew up in various locations in the United States. He’s seen the insides of more brands, versions, and generations of printer and printer-related hardware than almost anyone.
  • To learn more about this mysterious figure check out his blog on ZDNet and his Workspace on TechRepublic. You’ll be glad you did.
  • Produced by
    ZDNet and