Memory-efficient groupby-aggregate using compressed buffer trees

The rapid growth of fast analytics systems, that require data processing in memory, makes memory capacity an increasingly-precious resource. This paper introduces a new compressed data structure called a Compressed Buffer Tree (CBT). Using a combination of techniques including buffering, compression, and serialization, CBTs improve the memory efficiency and performance of the GroupBy-Aggregate abstraction that forms the basis of not only batch-processing models like MapReduce, but recent fast analytics systems too. For streaming workloads, aggregation using the CBT uses 21--42% less memory than using Google SparseHash with up to 16% better throughput. The CBT is also compared to batch-mode aggregators in MapReduce runtimes such as Phoenix++ and Metis and consumes 4x and 5x less memory with 1.5--2x and 3--4x more performance respectively.

[1]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[2]  Joseph M. Hellerstein,et al.  Online aggregation and continuous query support in MapReduce , 2010, SIGMOD Conference.

[3]  Robert L. Grossman,et al.  Sector and Sphere: the design and implementation of a high-performance data cloud , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[4]  Luigi Rizzo A very fast algorithm for RAM compression , 1997, OPSR.

[5]  Johannes Gehrke,et al.  Query optimization in compressed database systems , 2001, SIGMOD '01.

[6]  Michael A. Bender,et al.  Cache-oblivious streaming B-trees , 2007, SPAA '07.

[7]  Gerth Stølting Brodal,et al.  Worst-Case Efficient External-Memory Priority Queues , 1998 .

[8]  Prashant J. Shenoy,et al.  A platform for scalable one-pass analytics using MapReduce , 2011, SIGMOD '11.

[9]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[10]  Diana Inkpen,et al.  Real-Word Spelling Correction using Google Web 1T 3-grams , 2009, EMNLP.

[11]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[12]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[13]  Benjamin Rose,et al.  CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[14]  Jason Evans April A Scalable Concurrent malloc(3) Implementation for FreeBSD , 2006 .

[15]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[16]  Nick Roussopoulos,et al.  An alternative storage organization for ROLAP aggregate views based on cubetrees , 1998, SIGMOD '98.

[17]  Haibo Chen,et al.  Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[18]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[19]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[20]  Yao Zhang,et al.  Parallel lossless data compression on the GPU , 2012, 2012 Innovative Parallel Computing (InPar).

[21]  Vishal Monga,et al.  Robust perceptual image hashing using feature points , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[22]  Kurt Mehlhorn,et al.  A new data structure for representing sorted lists , 1982, Acta Informatica.

[23]  Michael Isard,et al.  Distributed aggregation for data-parallel computing: interfaces and implementations , 2009, SOSP '09.

[24]  D. E. Vengro A transparent parallel I/O environment , 1994 .

[25]  Robert Morris,et al.  Optimizing MapReduce for Multicore Architectures , 2010 .

[26]  Lars Arge,et al.  The Buffer Tree: A Technique for Designing Batched External Data Structures , 2003, Algorithmica.

[27]  Jinyang Li,et al.  Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.

[28]  Ivet Bahar,et al.  The relationship between N‐gram patterns and protein secondary structure , 2007, Proteins.

[29]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[30]  Lars Arge,et al.  External Memory Data Structures , 2001, ESA.

[31]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[32]  Wolfgang Gerlach,et al.  Engineering a compressed suffix tree implementation , 2007, JEAL.

[33]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[34]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[35]  Ting Liu,et al.  Clustering Billions of Images with Large Scale Nearest Neighbor Search , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[36]  Lu Liu,et al.  Muppet: MapReduce-Style Processing of Fast Data , 2012, Proc. VLDB Endow..

[37]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[38]  Páll Melsted,et al.  Efficient counting of k-mers in DNA sequences using a bloom filter , 2011, BMC Bioinformatics.

[39]  Wei Jiang,et al.  A Map-Reduce System with an Alternate API for Multi-core Environments , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[40]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[41]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[42]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[43]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[44]  Jaideep Srivastava,et al.  Aggregation Algorithms for Very Large Compressed Data Warehouses , 1999, VLDB.

[45]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[46]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[47]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[48]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[49]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[50]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[51]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[52]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[53]  Roberto Grossi,et al.  Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract) , 2000, STOC '00.

[54]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[55]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[56]  Justin Talbot,et al.  Phoenix++: modular MapReduce for shared-memory systems , 2011, MapReduce '11.

[57]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[58]  Peter Sanders,et al.  : Standard Template Library for XXL Data Sets , 2005, ESA.

[59]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[60]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[61]  Xiaowei Shen,et al.  Performance of hardware compressed main memory , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[62]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[63]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[64]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[65]  Liang Lin,et al.  Tenzing a SQL implementation on the MapReduce framework , 2011, Proc. VLDB Endow..

[66]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[67]  James Reinders,et al.  Intel® threading building blocks , 2008 .