Benchmarking Exploratory OLAP

Supporting interactive database exploration (IDE) is a problem that attracts lots of attention these days. Exploratory OLAP (On-Line Analytical Processing) is an important use case where tools support navigation and analysis of the most interesting data, using the best possible perspectives. While many approaches were proposed (like query recommendation, reuse, steering, personalization or unexpected data recommendation), a recurrent problem is how to assess the effectiveness of an exploratory OLAP approach. In this paper we propose a benchmark framework to do so, that relies on an extensible set of user-centric metrics that relate to the main dimensions of exploratory analysis. Namely, we describe how to model and simulate user activity, how to formalize our metrics and how to build exploratory tasks to properly evaluate an IDE system under test (SUT). To the best of our knowledge, this is the first proposal of such a benchmark. Experiments are two-fold: first we evaluate the benchmark protocol and metrics based on synthetic SUTs whose behavior is well known. Second, we concentrate on two different recent SUTs from IDE literature that are evaluated and compared with our benchmark. Finally, potential extensions to produce an industry-strength benchmark are listed in the conclusion.

[1]  Michel C. Desmarais,et al.  A review of recent advances in learner and skill modeling in intelligent learning environments , 2012, User Modeling and User-Adapted Interaction.

[2]  Véronique Cariou,et al.  Embedded indicators to facilitate the exploration of a data cube , 2009, Int. J. Bus. Intell. Data Min..

[3]  Arnaud Giacometti,et al.  Query recommendations for OLAP discovery driven analysis , 2009, DOLAP.

[4]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[5]  Matteo Golfarelli,et al.  Similarity measures for OLAP sessions , 2013, Knowledge and Information Systems.

[6]  Arnab Nandi,et al.  Distributed and interactive cube exploration , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[7]  Neoklis Polyzotis,et al.  QueRIE: Collaborative Database Exploration , 2014, IEEE Transactions on Knowledge and Data Engineering.

[8]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[9]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[10]  Gleb Gusev,et al.  Future User Engagement Prediction and Its Application to Improve the Sensitivity of Online Experiments , 2015, WWW.

[11]  Sunita Sarawagi,et al.  Intelligent Rollups in Multidimensional OLAP Data , 2001, VLDB.

[12]  Xuedong Chen,et al.  The Star Schema Benchmark and Augmented Fact Table Indexing , 2009, TPCTC.

[13]  Stefano Rizzi,et al.  Predicting Your Next OLAP Query Based on Recent Analytical Sessions , 2013, DaWaK.

[14]  Tilmann Rabl,et al.  Variations of the star schema benchmark to test the effects of data skew on query performance , 2013, ICPE '13.

[15]  Sunita Sarawagi,et al.  Explaining Differences in Multidimensional Aggregates , 1999, VLDB.

[16]  Verónika Peralta,et al.  Assessing the effectiveness of OLAP exploration approaches Technical report , 2016 .

[17]  Hakan Hacigümüs,et al.  Towards a workload for evolutionary analytics , 2013, DanaC '13.

[18]  Michael Stonebraker,et al.  Dynamic reduction of query result sets for interactive visualizaton , 2013, 2013 IEEE International Conference on Big Data.

[19]  Surajit Chaudhuri,et al.  Overview of Data Exploration Techniques , 2015, SIGMOD Conference.

[20]  Tilmann Rabl,et al.  A Data Generator for Cloud-Scale Benchmarking , 2010, TPCTC.

[21]  Klemens Böhm,et al.  Identifying User Interests within the Data Space - a Case Study with SkyServer , 2015, EDBT.

[22]  Stefano Rizzi,et al.  CubeLoad: A Parametric Generator of Realistic OLAP Workloads , 2014, CAiSE.

[23]  Carsten Sapia,et al.  PROMISE: Predicting Query Behavior to Enable Predictive Caching Strategies for OLAP Systems , 2000, DaWaK.

[24]  Martin L. Kersten,et al.  Meet Charles, big data query advisor , 2013, CIDR.

[25]  Sunita Sarawagi,et al.  User-Adaptive Exploration of Multidimensional Data , 2000, VLDB.

[26]  John R. Anderson,et al.  Knowledge tracing: Modeling the acquisition of procedural knowledge , 2005, User Modeling and User-Adapted Interaction.

[27]  Mohamed A. Sharaf,et al.  DivIDE: efficient diversification for interactive data exploration , 2014, SSDBM '14.

[28]  Yang Song,et al.  Evaluating and predicting user engagement change with degraded search relevance , 2013, WWW.

[29]  Evaggelia Pitoura,et al.  YmalDB: exploring relational databases via result-driven recommendations , 2013, The VLDB Journal.

[30]  Mounia Lalmas,et al.  Models of user engagement , 2012, UMAP.

[31]  Matteo Golfarelli,et al.  myOLAP: An Approach to Express and Evaluate OLAP Preferences , 2011, IEEE Transactions on Knowledge and Data Engineering.

[32]  Panos Vassiliadis,et al.  CineCubes: Aiding data workers gain insights from OLAP queries , 2015, Inf. Syst..

[33]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[34]  Matteo Golfarelli,et al.  A collaborative filtering approach for recommending OLAP sessions , 2015, Decis. Support Syst..

[35]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.