From TPC-C to Big Data Benchmarks: A Functional Workload Model

Big data systems help organizations store, manipulate, and derive value from vast amounts of data. Relational database and MapReduce are the two most prominent technologies for such systems. Organizations use them to perform complex analysis on diverse and unconventional data types with fast growing data volumes. As more big data systems are deployed, the industry faces the challenge to develop representative benchmarks that can evaluate the capabilities of competing implementations. In this position paper, we argue for building future big data benchmarks using what we call a "functional workload model". This concept draws on combined experiences from standard benchmarks, exemplified by TPC-C. The functional workload model describes the functional goals that the system must achieve, the data access patterns, the load variations over time, and the computation required to achieve the functional goals. Abstracting functional workload models from empirical studies of MapReduce deployments represents the first step towards building truly representative big data benchmarks.

[1]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[2]  Lavanya Ramakrishnan,et al.  Benchmarking MapReduce Implementations for Application Usage Scenarios , 2011, 2011 IEEE/ACM 12th International Conference on Grid Computing.

[3]  Charles Richter,et al.  The MCC software technology program , 1985, SOEN.

[4]  Mark Crovella,et al.  Computer Systems Performance Evaluation , 2007 .

[5]  Ivar Jacobson,et al.  Object-Oriented Software Engineering , 1991, TOOLS.

[6]  Ivar Jacobson,et al.  Object-oriented software engineering - a use case driven approach , 1993, TOOLS.

[7]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[8]  Lieven Eeckhout,et al.  Performance Evaluation and Benchmarking , 2005 .

[9]  Karl Huppler,et al.  The Art of Building a Good Benchmark , 2009, TPCTC.

[10]  Carolyn Turbyfill,et al.  AS3AP - A Comparative Relational Database Benchmark , 1989 .

[11]  Sally Floyd,et al.  Why we don't know how to simulate the Internet , 1997, WSC '97.

[12]  Omri Serlin The History of DebitCredit and the TPC , 1991, The Benchmark Handbook.

[13]  Roderic G. G. Cattell The benchmark handbook for database and transaction processing systems , 1991 .

[14]  Yanpei Chen,et al.  Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads , 2012, Proc. VLDB Endow..

[15]  Patrick E. O'Neil A Set Query Benchmark for Large Databases , 1989, Int. CMG Conference.

[16]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Domenico Ferrari Characterizing a workload for the comparison of interactive services , 1899 .

[19]  DOMENICO FERRARI Characterizing a workload for the comparison of interactive services , 1979, 1979 International Workshop on Managing Requirements Knowledge (MARK).

[20]  Shengsheng Huang,et al.  HiBench : A Representative and Comprehensive Hadoop Benchmark Suite , 2012 .

[21]  Yanpei Chen,et al.  Workload-Driven Design and Evaluation of Large-Scale Data-Centric Systems , 2012 .

[22]  Francois Raab,et al.  TPC-C - The Standard Benchmark for Online transaction Processing (OLTP) , 1993, The Benchmark Handbook.

[23]  Harumi A. Kuno,et al.  The mixed workload CH-benCHmark , 2011, DBTest '11.

[24]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[25]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[26]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[27]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).