Automatic assessment of interactive OLAP explorations

Abstract Interactive Database Exploration (IDE) is the process of exploring a database by means of a sequence of queries aiming at answering an often imprecise user information need. In this paper, we are interested in the following problem: how to automatically assess the quality of such an exploration. We study this problem under the following angles. First, we formulate the hypothesis that the quality of the exploration can be measured by evaluating the improvement of the skill of writing queries that contribute to the exploration. Second, we restrict to a particular use case of database exploration, namely OLAP explorations of data cubes. Third, we propose to use simple query features to model its contribution to an exploration. The first hypothesis allows to use the Knowledge Tracing, a popular model for skill acquisition, to measure the evolution of the ability to write contributive queries. The restriction to OLAP exploration allows to take advantage of well known OLAP primitives and schema. Finally, using query features allows to apply a supervised learning approach to model query contribution. We show on both real and artificial explorations that automatic assessment of OLAP explorations is feasible and is consistent with the user’s and expert’s viewpoints.

[1]  Shrainik Jain,et al.  SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment , 2016, SIGMOD Conference.

[2]  Leonidas J. Guibas,et al.  Deep Knowledge Tracing , 2015, NIPS.

[3]  Carsten Binnig,et al.  Towards a Benchmark for Interactive Data Exploration , 2016, IEEE Data Eng. Bull..

[4]  Carsten Sapia,et al.  PROMISE: Predicting Query Behavior to Enable Predictive Caching Strategies for OLAP Systems , 2000, DaWaK.

[5]  Tilmann Rabl,et al.  Variations of the star schema benchmark to test the effects of data skew on query performance , 2013, ICPE '13.

[6]  Matteo Golfarelli,et al.  Similarity measures for OLAP sessions , 2013, Knowledge and Information Systems.

[7]  Sunita Sarawagi,et al.  User-Adaptive Exploration of Multidimensional Data , 2000, VLDB.

[8]  Ramanathan V. Guha,et al.  User Modeling for a Personal Assistant , 2015, WSDM.

[9]  Dorota Glowacka,et al.  Is exploratory search different? A comparison of information search behavior for exploratory and lookup tasks , 2016, J. Assoc. Inf. Sci. Technol..

[10]  Patrick Marcel,et al.  Detecting User Focus in OLAP Analyses , 2017, ADBIS.

[11]  Neil T. Heffernan,et al.  Extending Knowledge Tracing to Allow Partial Credit: Using Continuous versus Binary Nodes , 2013, AIED.

[12]  Patrick Marcel,et al.  Can Models Learned from a Dataset Reflect Acquisition of Procedural Knowledge? An Experiment with Automatic Measurement of Online Review Quality , 2018, DOLAP.

[13]  Matteo Golfarelli,et al.  A collaborative filtering approach for recommending OLAP sessions , 2015, Decis. Support Syst..

[14]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[15]  Panos Vassiliadis,et al.  CineCubes: Aiding data workers gain insights from OLAP queries , 2015, Inf. Syst..

[16]  Ryen W. White,et al.  Struggling and Success in Web Search , 2015, CIKM.

[17]  Ya'akov Gal,et al.  Sequencing educational content in classrooms using Bayesian knowledge tracing , 2016, LAK.

[18]  Ralph Kimball,et al.  The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses , 1996 .

[19]  Pedro Furtado,et al.  Benchmarking Exploratory OLAP , 2016, TPCTC.

[20]  Xuedong Chen,et al.  The Star Schema Benchmark and Augmented Fact Table Indexing , 2009, TPCTC.

[21]  Surajit Chaudhuri,et al.  Overview of Data Exploration Techniques , 2015, SIGMOD Conference.

[22]  Radek Pelánek,et al.  Metrics for Evaluation of Student Models , 2015, EDM.

[23]  John R. Anderson,et al.  Knowledge tracing: Modeling the acquisition of procedural knowledge , 2005, User Modeling and User-Adapted Interaction.

[24]  Kenneth R. Koedinger,et al.  Performance Factors Analysis - A New Alternative to Knowledge Tracing , 2009, AIED.

[25]  K. Cauley,et al.  Studying Knowledge Acquisition: Distinctions among Procedural, Conceptual and Logical Knowledge. , 1986 .

[26]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[27]  Mohamed A. Sharaf,et al.  DivIDE: efficient diversification for interactive data exploration , 2014, SSDBM '14.

[28]  Martin L. Kersten,et al.  Meet Charles, big data query advisor , 2013, CIDR.

[29]  Neil T. Heffernan,et al.  Comparing Knowledge Tracing and Performance Factor Analysis by Using Multiple Model Fitting Procedures , 2010, Intelligent Tutoring Systems.

[30]  Stefano Rizzi,et al.  CubeLoad: A Parametric Generator of Realistic OLAP Workloads , 2014, CAiSE.

[31]  Arnab Nandi,et al.  Distributed and interactive cube exploration , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[32]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[33]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[34]  Evaggelia Pitoura,et al.  YmalDB: exploring relational databases via result-driven recommendations , 2013, The VLDB Journal.

[35]  Patrick Marcel,et al.  User Interests Clustering in Business Intelligence Interactions , 2017, CAiSE.

[36]  Michael Stonebraker,et al.  Dynamic reduction of query result sets for interactive visualizaton , 2013, 2013 IEEE International Conference on Big Data.

[37]  Neil T. Heffernan,et al.  Learning Bayesian Knowledge Tracing Parameters with a Knowledge Heuristic and Empirical Probabilities , 2014, Intelligent Tutoring Systems.