Massively Parallel Databases and MapReduce Systems

Timely and cost-effective analytics over "big data" has emerged as a key ingredient for success in many businesses, scientific and engineering disciplines, and government endeavors. Web clicks, social media, scientific experiments, and datacenter monitoring are among data sources that generate vast amounts of raw data every day. The need to convert this raw data into useful information has spawned considerable innovation in systems for large-scale data analytics, especially over the last decade. This monograph covers the design principles and core features of systems for analyzing very large datasets using massively-parallel computation and storage techniques on large clusters of nodes. We first discuss how the requirements of data analytics have evolved since the early work on parallel database systems. We then describe some of the major technological innovations that have each spawned a distinct category of systems for data analytics. Each unique system category is described along a number of dimensions including data model and query interface, storage layer, execution engine, query optimization, scheduling, resource management, and fault tolerance. We conclude with a summary of present trends in large-scale data analytics.

[1]  Dennis G. Severance,et al.  The use of cluster analysis in physical data base design , 1975, VLDB '75.

[2]  Irving L. Traiger,et al.  System R: relational approach to database management , 1976, TODS.

[3]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[4]  Andrea J. Borr Transaction Monitoring in ENCOMPASS: Reliable Distributed Transaction Processing , 1981, VLDB.

[5]  Stefano Ceri,et al.  Horizontal data partitioning in database design , 1982, SIGMOD '82.

[6]  Hongjun Lu,et al.  Dynamic Task Allocation in a Distributed Database System , 1985, ICDCS.

[7]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[8]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[9]  Jim Gray,et al.  A benchmark of NonStop SQL release 2 demonstrating near-linear speedup and scaleup on large databases , 1990, SIGMETRICS '90.

[10]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[11]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[12]  Patrick Valduriez,et al.  Prototyping Bubba, A Highly Parallel Database System , 1990, IEEE Trans. Knowl. Data Eng..

[13]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[14]  Hongjun Lu,et al.  Optimization of Multi-Way Join Queries for Parallel Execution , 1991, VLDB.

[15]  David J. DeWitt,et al.  Practical Skew Handling in Parallel Joins , 1992, VLDB.

[16]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[17]  Hongjun Lu,et al.  Dynamic and Load-balanced Task-Oriented Datbase Query Processing in Parallel Systems , 1992, EDBT.

[18]  Philip S. Yu,et al.  Interleaving a Join Sequence with Semijoins in Distributed Query Processing , 1992, IEEE Trans. Parallel Distributed Syst..

[19]  Hongjun Lu,et al.  On Resource Scheduling of Multi-Join Queries in Parallel Database Systems , 1993, Inf. Process. Lett..

[20]  Patrick Valduriez,et al.  On the Effectiveness of Optimization Search Strategies for Parallel Execution Spaces , 1993, VLDB.

[21]  Miron Livny,et al.  Towards Automated Performance Tuning for Complex Workloads , 1994, VLDB.

[22]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[23]  Ron Buck The Oracle media server for nCUBE massively parallel systems , 1994, Proceedings of 8th International Parallel Processing Symposium.

[24]  Hongjun Lu,et al.  Load Balanced Join Processing in Shared-Noting Systems , 1994, J. Parallel Distributed Comput..

[25]  Hongjun Lu,et al.  Query Processing in Parallel Relational Database Systems , 1994 .

[26]  Ron Buck nCUBE Corporation: The Oracle Media Server for nCube Massively Parallel Systems , 1994 .

[27]  G. Graefe The Cascades Framework for Query Optimization , 1995, IEEE Data Eng. Bull..

[28]  Rajeev Motwani,et al.  Coloring Away Communication in Parallel Query Optimization , 1995, VLDB.

[29]  Chaitanya K. Baru,et al.  DB2 Parallel Edition , 1995, IBM Syst. J..

[30]  Miron Livny,et al.  Multiclass Query Scheduling in Real-Time Database Systems , 1995, IEEE Trans. Knowl. Data Eng..

[31]  Rajeev Motwani,et al.  Scheduling problems in parallel query optimization , 1995, PODS '95.

[32]  David J. DeWitt,et al.  Data placement in shared-nothing parallel database systems , 1997, The VLDB Journal.

[33]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[34]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[35]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[36]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[37]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[38]  Chun Zhang,et al.  Automating physical database design in a parallel database , 2002, SIGMOD '02.

[39]  M. Tyers,et al.  Osprey: a network visualization system , 2003, Genome Biology.

[40]  Alfons Kemper,et al.  Experience Report: Exploiting Advanced Database Optimization Features for Large-Scale SAP R/3 Installations , 2002, VLDB.

[41]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[42]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[43]  GhemawatSanjay,et al.  The Google file system , 2003 .

[44]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[45]  Surajit Chaudhuri,et al.  SQLCM: a continuous monitoring framework for relational database engines , 2004, Proceedings. 20th International Conference on Data Engineering.

[46]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[47]  Roger MacNicol,et al.  Sybase IQ Multiplex - Designed For Analytics , 2004, VLDB.

[48]  Graham Wood,et al.  Automatic Performance Diagnosis and Tuning in Oracle , 2005, CIDR.

[49]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[50]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[51]  Michael Stonebraker,et al.  Optimization of parallel query execution plans in XPRS , 2005, Distributed and Parallel Databases.

[52]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[53]  Garrick Staples,et al.  TORQUE resource manager , 2006, SC.

[54]  Vivek R. Narasayya,et al.  Automatic physical design tuning: workload as a sequence , 2006, SIGMOD Conference.

[55]  Brian Beckman,et al.  LINQ: reconciling object, relations and XML in the .NET framework , 2006, SIGMOD Conference.

[56]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[57]  Torsten Grust,et al.  MonetDB/XQuery: a fast XQuery processor powered by a relational engine , 2006, SIGMOD Conference.

[58]  Volker Markl,et al.  Progressive optimization in a shared-nothing parallel database , 2007, SIGMOD '07.

[59]  Harumi A. Kuno,et al.  Dynamic Workload Management for Very Large Data Warehouses: Juggling Feathers and Bowling Balls , 2007, VLDB.

[60]  David J. DeWitt,et al.  Materialization Strategies in a Column-Oriented DBMS , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[61]  Marcin Zukowski,et al.  Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS , 2007, VLDB.

[62]  Gautam Jain Query Optimization for Parallel Execution , 2007 .

[63]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[64]  Andrea C. Arpaci-Dusseau,et al.  An analysis of data corruption in the storage stack , 2008, TOS.

[65]  Lisa Hellerstein,et al.  Flow Algorithms for Parallel Query Optimization , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[66]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[67]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[68]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[69]  Michael J. Franklin,et al.  Continuous Analytics: Rethinking Query Processing in a Network-Effect World , 2009, CIDR.

[70]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[71]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[72]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[73]  Kevin Wilkinson,et al.  Managing long-running queries , 2009, EDBT '09.

[74]  Daniel J. Abadi,et al.  Column oriented Database Systems , 2009, Proc. VLDB Endow..

[75]  RIOT: I/O-Efficient Numerical Computing without SQL , 2009, CIDR.

[76]  Michael Isard,et al.  Distributed aggregation for data-parallel computing: interfaces and implementations , 2009, SOSP '09.

[77]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[78]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[79]  Michael Isard,et al.  Distributed data-parallel computing using a high-level programming language , 2009, SIGMOD Conference.

[80]  John Cieslewicz,et al.  SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions , 2009, Proc. VLDB Endow..

[81]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[82]  Vinay Setty,et al.  Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) , 2010, Proc. VLDB Endow..

[83]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[84]  Willy Zwaenepoel,et al.  HadoopToSQL: a mapReduce query optimizer , 2010, EuroSys '10.

[85]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[86]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[87]  Craig Chambers,et al.  FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[88]  Thomas Sandholm,et al.  Dynamic Proportional Share Scheduling in Hadoop , 2010, JSSPP.

[89]  Kemafor Anyanwu,et al.  Scheduling Hadoop Jobs to Meet Deadlines , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[90]  Dominic Battré,et al.  Nephele/PACTs: a programming model and execution framework for web-scale analytical processing , 2010, SoCC '10.

[91]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[92]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[93]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[94]  Christopher Ré,et al.  Manimal: relational optimization for data-intensive programs , 2010, WebDB '10.

[95]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[96]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[97]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[98]  Shivnath Babu,et al.  Towards automatic optimization of MapReduce programs , 2010, SoCC '10.

[99]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[100]  Jingren Zhou,et al.  Incorporating partitioning and parallel plans into the SCOPE optimizer , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[101]  Herodotos Herodotou,et al.  Xplus , 2010, Proc. VLDB Endow..

[102]  George Kollios,et al.  MRShare , 2010, Proc. VLDB Endow..

[103]  Songting Chen,et al.  Cheetah , 2010, Proc. VLDB Endow..

[104]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[105]  Dominic Battré,et al.  Massively parallel data analysis with PACTs on Nephele , 2010, Proc. VLDB Endow..

[106]  Beng Chin Ooi,et al.  Query optimization for massively parallel data processing , 2011, SoCC.

[107]  Herodotos Herodotou,et al.  Query optimization techniques for partitioned tables , 2011, SIGMOD '11.

[108]  Liang Lin,et al.  Tenzing a SQL implementation on the MapReduce framework , 2011, Proc. VLDB Endow..

[109]  Jorge-Arnulfo Quiané-Ruiz,et al.  Trojan data layouts: right shoes for a running elephant , 2011, SoCC.

[110]  Andrey Balmin,et al.  Jaql , 2011, Proc. VLDB Endow..

[111]  Zhiwei Xu,et al.  RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[112]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[113]  Lars George,et al.  HBase: The Definitive Guide , 2011 .

[114]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[115]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[116]  Michael Stonebraker,et al.  The Architecture of SciDB , 2011, SSDBM.

[117]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[118]  Beng Chin Ooi,et al.  Llama: leveraging columnar storage for scalable join processing in the MapReduce framework , 2011, SIGMOD '11.

[119]  Yuanyuan Tian,et al.  CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop , 2011, Proc. VLDB Endow..

[120]  Harumi A. Kuno,et al.  The mixed workload CH-benCHmark , 2011, DBTest '11.

[121]  Chao Tian,et al.  Nova: continuous Pig/Hadoop workflows , 2011, SIGMOD '11.

[122]  Alin Deutsch,et al.  ASTERIX: towards a scalable, semistructured data platform for evolving-world models , 2011, Distributed and Parallel Databases.

[123]  Rares Vernica,et al.  Hyracks: A flexible and extensible foundation for data-intensive computing , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[124]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[125]  Abraham Silberschatz,et al.  Efficient processing of data warehousing queries in a split execution environment , 2011, SIGMOD '11.

[126]  Jignesh M. Patel,et al.  Column-Oriented Storage Techniques for MapReduce , 2011, Proc. VLDB Endow..

[127]  Fusheng Wang,et al.  YSmart: Yet Another SQL-to-MapReduce Translator , 2011, 2011 31st International Conference on Distributed Computing Systems.

[128]  Ying Zhang,et al.  SciQL, a query language for science applications , 2010, AD '11.

[129]  Jorge-Arnulfo Quiané-Ruiz,et al.  Efficient Big Data Processing in Hadoop MapReduce , 2012, Proc. VLDB Endow..

[130]  Volker Markl,et al.  Spinning Fast Iterative Data Flows , 2012, Proc. VLDB Endow..

[131]  Viktor Leis,et al.  HyPer: Adapting Columnar Main-Memory Data Management for Transactional AND Query Processing , 2012, IEEE Data Eng. Bull..

[132]  Raghu Ramakrishnan,et al.  Sailfish: a framework for large scale data processing , 2012, SoCC '12.

[133]  Jorge-Arnulfo Quiané-Ruiz,et al.  Only Aggressive Elephants are Fast Elephants , 2012, Proc. VLDB Endow..

[134]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[135]  Herodotos Herodotou,et al.  Stubby: A Transformation-based Optimizer for MapReduce Workflows , 2012, Proc. VLDB Endow..

[136]  David J. DeWitt,et al.  Can the Elephants Handle the NoSQL Onslaught? , 2012, Proc. VLDB Endow..

[137]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[138]  David Cunningham,et al.  M3R: Increased performance for in-memory Hadoop jobs , 2012, Proc. VLDB Endow..

[139]  Ramakrishna Varadarajan,et al.  The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..

[140]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[141]  Nicolas Bruno,et al.  SCOPE: parallel databases meet MapReduce , 2012, The VLDB Journal.

[142]  Marcin Zukowski,et al.  From Cooperative Scans to Predictive Buffer Management , 2012, Proc. VLDB Endow..

[143]  Badrish Chandramouli,et al.  Temporal Analytics on Big Data for Web Advertising , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[144]  Sandeep Tata,et al.  Clydesdale: structured data processing on MapReduce , 2012, EDBT '12.

[145]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[146]  Martin L. Kersten,et al.  MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[147]  Lu Liu,et al.  Muppet: MapReduce-Style Processing of Fast Data , 2012, Proc. VLDB Endow..

[148]  Carlo Curino,et al.  Automating the database schema evolution process , 2012, The VLDB Journal.

[149]  Kyuseok Shim,et al.  MapReduce Algorithms for Big Data Analysis , 2012, Proc. VLDB Endow..

[150]  Sudipto Guha,et al.  REX: Recursive, Delta-Based Data-Centric Computation , 2012, Proc. VLDB Endow..

[151]  Martin Grund,et al.  An overview of HYRISE - a Main Memory Hybrid Storage Engine , 2012, IEEE Data Eng. Bull..

[152]  Norman May,et al.  The SAP HANA Database -- An Architecture Overview , 2012, IEEE Data Eng. Bull..

[153]  Michael J. Carey,et al.  Extending Map-Reduce for Efficient Predicate-Based Sampling , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[154]  Jae-Gil Lee,et al.  Business Analytics in (a) Blink , 2012, IEEE Data Eng. Bull..

[155]  Carlo Zaniolo,et al.  Early Accurate Results for Advanced Analytics on MapReduce , 2012, Proc. VLDB Endow..

[156]  Alexander Hall,et al.  Processing a Trillion Cells per Mouse Click , 2012, Proc. VLDB Endow..

[157]  Marcin Zukowski,et al.  Vectorwise: Beyond Column Stores , 2012, IEEE Data Eng. Bull..

[158]  L. S. S. Reddy,et al.  Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments , 2012, ArXiv.

[159]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[160]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[161]  Volker Markl,et al.  Applying Stratosphere for Big Data Analytics , 2013, BTW.

[162]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[163]  Yanfeng Zhang,et al.  PrIter: A Distributed Framework for Prioritizing Iterative Computations , 2011, IEEE Transactions on Parallel and Distributed Systems.

[164]  Michael Isard,et al.  Differential Dataflow , 2013, CIDR.

[165]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[166]  Neoklis Polyzotis,et al.  Iterative MapReduce for Large Scale Machine Learning , 2013, ArXiv.

[167]  Komal Shringare,et al.  Apache Hadoop Goes Realtime at Facebook , 2015 .

[168]  M. Slee,et al.  Thrift : Scalable Cross-Language Services Implementation , 2022 .