论文信息 - Smart Drill Down - 字舞流文

Smart Drill Down

We present smart drill-down, an operator for interactively exploring a relational table to discover and summarize “interesting” groups of tuples. Each group of tuples is described by a rule. For instance, the rule (a, b, ?, 1000) tells us that there are a thousand tuples with value a in the first column and b in the second column (and any value in the third column). Smart drill-down presents an analyst with a list of rules that together describe interesting aspects of the table. The analyst can tailor the definition of interesting, and can interactively apply smart drill-down on an existing rule to explore that part of the table. We demonstrate that the underlying optimization problems are NP-HARD, and describe an algorithm for finding the approximately optimal list of rules to display when the user uses a smart drill-down, and a dynamic sampling scheme for efficiently interacting with large tables. Finally, we perform experiments on real datasets to demonstrate the usefulness of smart drill-down and study the performance of our algorithms.

Aditya G. Parameswaran | Hector Garcia-Molina | Manas Joglekar | H. Garcia-Molina | Manas R. Joglekar

[1] Ramakrishnan Srikant,et al. Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[2] Bei Yu,et al. On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..

[3] Renée J. Miller,et al. Association rules over interval data , 1997, SIGMOD '97.

[4] Laks V. S. Lakshmanan,et al. MDL Summarization with Holes , 2005, VLDB.

[5] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6] Ion Stoica,et al. BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[7] A. I. McLeod,et al. A Convenient Algorithm for Drawing a Simple Random Sample , 1983 .

[8] Sunita Sarawagi,et al. User-Adaptive Exploration of Multidimensional Data , 2000, VLDB.

[9] Yang Xiang,et al. Succinct summarization of transactional databases: an overlapped hyperrectangle scheme , 2008, KDD.

[10] P. Grünwald. The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[11] Jiawei Han,et al. TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12] Nimrod Megiddo,et al. Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[13] Divesh Srivastava,et al. Efficient and Effective Analysis of Data Quality using Pattern Tableaux , 2011, IEEE Data Eng. Bull..

[14] Sunita Sarawagi. User-cognizant multidimensional analysis , 2001, The VLDB Journal.

[15] Laks V. S. Lakshmanan,et al. The Generalized MDL Approach for Summarization , 2002, VLDB.

[16] Jilles Vreeken,et al. Tell me what i need to know: succinctly summarizing data with itemsets , 2011, KDD.

[17] Hamid Pirahesh,et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[18] Jeffrey Scott Vitter,et al. Random sampling with a reservoir , 1985, TOMS.

[19] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[20] Parag Agrawal,et al. Interpretable and Informative Explanations of Outcomes , 2014, Proc. VLDB Endow..

[21] Sridhar Ramaswamy,et al. The Aqua approximate query answering system , 1999, SIGMOD '99.

[22] Bart Goethals,et al. Tiling Databases , 2004, Discovery Science.