论文信息 - Better Rulesets by Removing Redundant Specialisations and Generalisations in Association Rule Mining

Better Rulesets by Removing Redundant Specialisations and Generalisations in Association Rule Mining

Association rule mining is a fundamental task in many data mining and analysis applications, both for knowledge extraction and as part of other processes (for example, building associative classifiers). It is well known that the number of associations identified by many association rule mining algorithms can be so large as to present a barrier to their interpretability and practical use. A typical solution to this problem involves removing redundant rules. This paper proposes a novel definition of redundancy, which is used to identify only the most interesting associations. Compared to existing redundancy based approaches, our method is both more robust to noise, and produces fewer overall rules for a given data (improving clarity). A rule can be considered redundant if the knowledge it describes is already contained in other rules. Given an association rule, most existing approaches consider rules to be redundant if they add additional variables without increasing quality according to some measure of interestingness. We claim that complex interactions between variables can confound many interestingness measures. This can lead to existing approaches being overly aggressive in removing redundant associations. Most existing approaches also fail to take into account situations where more general rules (those with fewer attributes) can be considered redundant with respect to their specialisations. We examine this problem and provide concrete examples of such errors using artificial data. An alternate definition of redundancy that addresses these issues is proposed. Our approach is shown to identify interesting associations missed by comparable methods on multiple real and synthetic data. When combined with the removal of redundant generalisations, our approach is often able to generate smaller overall rule sets, while leaving average rule quality unaffected or slightly improved.

[1] Geoffrey I. Webb. Self-sufficient itemsets: An approach to screening potentially interesting associations between items , 2010, TKDD.

[2] Geoffrey I. Webb. Discovering significant rules , 2006, KDD '06.

[3] Mohammed J. Zaki. Generating non-redundant association rules , 2000, KDD '00.

[4] Sanjay Chawla,et al. Using Significant, Positively Associated and Relatively Class Correlated Rules for Associative Classification of Imbalanced Datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5] Osmar R. Zaïane,et al. Associative Classification with Statistically Significant Positive and Negative Rules , 2015, CIKM.

[6] Jaideep Srivastava,et al. Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[7] Wilhelmiina Hämäläinen,et al. Efficient Discovery of the Top-K Optimal Dependency Rules with Fisher's Exact Test of Significance , 2010, 2010 IEEE International Conference on Data Mining.

[8] Rajeev Motwani,et al. Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[9] Kate Smith-Miles,et al. A New Approach of Eliminating Redundant Association Rules , 2004, DEXA.

[10] Tingjian Ge,et al. Discovering and managing quantitative association rules , 2013, CIKM.

[11] Philip S. Yu,et al. A New Approach to Online Generation of Association Rules , 2001, IEEE Trans. Knowl. Data Eng..

[12] Patrick Bossuyt,et al. Systematic Reviews of Diagnostic Test Accuracy , 2008, Annals of Internal Medicine.

[13] Simon K. Poon,et al. Interaction as an Interestingness Measure , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[14] Gregory Piatetsky-Shapiro,et al. Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[15] Wilhelmiina Hämäläinen,et al. Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures , 2011, Knowledge and Information Systems.

[16] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17] Wynne Hsu,et al. Identifying non-actionable association rules , 2001, KDD '01.

[18] Geoffrey I. Webb. Discovering Significant Patterns , 2007, Machine Learning.