Smart Drill Down

We present smart drill-down, an operator for interactively exploring a relational table to discover and summarize “interesting” groups of tuples. Each group of tuples is described by a rule. For instance, the rule (a, b, ?, 1000) tells us that there are a thousand tuples with value a in the first column and b in the second column (and any value in the third column). Smart drill-down presents an analyst with a list of rules that together describe interesting aspects of the table. The analyst can tailor the definition of interesting, and can interactively apply smart drill-down on an existing rule to explore that part of the table. We demonstrate that the underlying optimization problems are NP-HARD, and describe an algorithm for finding the approximately optimal list of rules to display when the user uses a smart drill-down, and a dynamic sampling scheme for efficiently interacting with large tables. Finally, we perform experiments on real datasets to demonstrate the usefulness of smart drill-down and study the performance of our algorithms.

[1]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[2]  Bei Yu,et al.  On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..

[3]  Renée J. Miller,et al.  Association rules over interval data , 1997, SIGMOD '97.

[4]  Laks V. S. Lakshmanan,et al.  MDL Summarization with Holes , 2005, VLDB.

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[7]  A. I. McLeod,et al.  A Convenient Algorithm for Drawing a Simple Random Sample , 1983 .

[8]  Sunita Sarawagi,et al.  User-Adaptive Exploration of Multidimensional Data , 2000, VLDB.

[9]  Yang Xiang,et al.  Succinct summarization of transactional databases: an overlapped hyperrectangle scheme , 2008, KDD.

[10]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[11]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[13]  Divesh Srivastava,et al.  Efficient and Effective Analysis of Data Quality using Pattern Tableaux , 2011, IEEE Data Eng. Bull..

[14]  Sunita Sarawagi User-cognizant multidimensional analysis , 2001, The VLDB Journal.

[15]  Laks V. S. Lakshmanan,et al.  The Generalized MDL Approach for Summarization , 2002, VLDB.

[16]  Jilles Vreeken,et al.  Tell me what i need to know: succinctly summarizing data with itemsets , 2011, KDD.

[17]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[18]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[19]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[20]  Parag Agrawal,et al.  Interpretable and Informative Explanations of Outcomes , 2014, Proc. VLDB Endow..

[21]  Sridhar Ramaswamy,et al.  The Aqua approximate query answering system , 1999, SIGMOD '99.

[22]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.