Sets of Robust Rules, and How to Find Them

Association rules are among the most important concepts in data mining. Rules of the form \(X \rightarrow Y\) are simple to understand, simple to act upon, yet can model important local dependencies in data. The problem is, however, that there are so many of them. Both traditional and state-of-the-art frameworks typically yield millions of rules, rather than identifying a small set of rules that capture the most important dependencies of the data. In this paper, we define the problem of association rule mining in terms of the Minimum Description Length principle. That is, we identify the best set of rules as the one that most succinctly describes the data. We show that the resulting optimization problem does not lend itself for exact search, and hence propose Grab, a greedy heuristic to efficiently discover good sets of noise-resistant rules directly from data. Through extensive experiments we show that, unlike the state-of-the-art, Grab does reliably recover the ground truth. On real world data we show it finds reasonable numbers of rules, that upon close inspection give clear insight in the local distribution of the data.

[1]  S. Knuutila,et al.  DNA copy number amplification profiling of human neoplasms , 2006, Oncogene.

[2]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[3]  Salvatore Orlando,et al.  Mining Top-K Patterns from Binary Datasets in Presence of Noise , 2010, SDM.

[4]  Geoffrey I. Webb Discovering Significant Patterns , 2007, Machine Learning.

[5]  Toon Calders,et al.  Non-derivable itemset mining , 2007, Data Mining and Knowledge Discovery.

[6]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[7]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[8]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[9]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[10]  Tijl De Bie,et al.  Maximum entropy models and subjective interestingness: an application to tiles in binary databases , 2010, Data Mining and Knowledge Discovery.

[11]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[12]  Fabian Mörchen,et al.  Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression , 2010, Knowledge and Information Systems.

[13]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[14]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[15]  Jilles Vreeken,et al.  Finding Good Itemsets by Packing Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[16]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[17]  Wilhelmiina Hämäläinen,et al.  Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures , 2011, Knowledge and Information Systems.

[18]  Siegfried Nijssen,et al.  Supervised Pattern Mining and Applications to Classification , 2014, Frequent Pattern Mining.

[19]  Jilles Vreeken,et al.  Interesting Patterns , 2014, Frequent Pattern Mining.

[20]  Nikolaj Tatti Maximum Entropy Based Significance of Itemsets , 2007, ICDM.

[21]  Karsten M. Borgwardt,et al.  Finding significant combinations of features in the presence of categorical covariates , 2016, NIPS.

[22]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[23]  Bernhard Schölkopf,et al.  Identifying Cause and Effect on Discrete Data using Additive Noise Models , 2010, AISTATS.

[24]  Szymon Jaroszewicz,et al.  Interestingness of frequent itemsets using Bayesian networks as background knowledge , 2004, KDD.

[25]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[26]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[27]  Charles A. Sutton,et al.  A Subsequence Interleaving Model for Sequential Pattern Mining , 2016, KDD.

[28]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[29]  Pauli Miettinen,et al.  MDL4BMF: Minimum Description Length for Boolean Matrix Factorization , 2014, TKDD.

[30]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[31]  Jilles Vreeken,et al.  Summarizing data succinctly with the most informative itemsets , 2012, TKDD.

[32]  Yang Xiang,et al.  Succinct summarization of transactional databases: an overlapped hyperrectangle scheme , 2008, KDD.

[33]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[34]  Leonardo Pellegrina,et al.  Efficient mining of the most significant patterns with permutation testing , 2018, Data Mining and Knowledge Discovery.

[35]  Petri Myllymäki,et al.  MDL Histogram Density Estimation , 2007, AISTATS.