Drum: A rhythmic approach to interactive analytics on large data

In this paper, we study how to progressively answer a time-consuming query on a large data set by generating a sequence of mini-queries. We formulate an optimization problem to produce the predicates of mini-queries by considering both their total running time as well as the smoothness of result delivery in order to show the incremental results at a rhythmic pace to improve the user experience. We develop an adaptive framework called Drum that can collect the runtime behavioral statistics of the database system to decide the predicate of the next mini-query appropriately. The framework is a general middleware solution without any changes to the underlying database system. We have conducted extensive experiments on a large, real data set, and the results show that Drum can reduce the delay of delivering intermediate results to the user without sacrificing much total time.

[1]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[2]  A. Moorsel Metrics for the Internet Age: Quality of Experience and Quality of Business , 2001 .

[3]  Shivnath Babu,et al.  Predicting completion times of batch query workloads using interaction-aware models and simulation , 2011, EDBT/ICDT '11.

[4]  Jeffrey F. Naughton,et al.  Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads , 2013, Proc. VLDB Endow..

[5]  Chris Jermaine,et al.  Online aggregation for large MapReduce jobs , 2011, Proc. VLDB Endow..

[6]  Jeffrey F. Naughton,et al.  Uncertainty Aware Query Execution Time Prediction , 2014, Proc. VLDB Endow..

[7]  Allan Kuchinsky,et al.  Integrating user-perceived quality into Web server design , 2000, Comput. Networks.

[8]  Jeffrey Considine,et al.  Spatio-temporal aggregation using sketches , 2004, Proceedings. 20th International Conference on Data Engineering.

[9]  Beng Chin Ooi,et al.  Query Rewriting for SWIFT (First) Answers , 2000, IEEE Trans. Knowl. Data Eng..

[10]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[11]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[12]  Chen Li,et al.  AsterixDB: A Scalable, Open Source BDMS , 2014, Proc. VLDB Endow..

[13]  Peter J. Haas,et al.  Interactive data Analysis: The Control Project , 1999, Computer.

[14]  Chen Li,et al.  LSM-Based Storage and Indexing: An Old Idea with Timely Benefits , 2015, GeoRich@SIGMOD.

[15]  Markus Fiedler Euro-NGI D.JRA.6.1.1 : State-of-the-art with regards to user-perceived Quality of Service and quality feedback , 2004 .

[16]  Sharad Mehrotra,et al.  Progressive approximate aggregate queries with a multi-resolution tree structure , 2001, SIGMOD '01.

[17]  Yongxia Skadberg,et al.  Visitors' flow experience while browsing a Web site: its measurement, contributing factors and consequences , 2004, Comput. Hum. Behav..

[18]  Calton Pu,et al.  ActiveSLA: a profit-oriented admission control framework for database-as-a-service providers , 2011, SoCC.

[19]  M. Csíkszentmihályi Creativity: Flow and the Psychology of Discovery and Invention , 1996 .

[20]  Jakob Nielsen,et al.  Usability engineering , 1997, The Computer Science and Engineering Handbook.

[21]  Joseph M. Hellerstein,et al.  Partial results for online query processing , 2002, SIGMOD '02.

[22]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[23]  Markus Fiedler,et al.  Waiting times in quality of experience for web based services , 2012, 2012 Fourth International Workshop on Quality of Multimedia Experience.