What Is Interesting: Studies on Interestingness in Knowledge Discovery

Knowledge Discovery in Databases (KDD) was defined by [FPSS96a] as “[. . . ] the non-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data.” As the size of databases increases, the number of patterns mined from them also increases. This number can easily increase to an extent that overwhelms users. To address this problem, patterns need to be processed for interestingness in order to distinguish the “valid, novel, potentially useful and ultimately understandable patterns” from those that are not. Interestingness has been identified as a central problem in KDD. In this work, we study this problem: the interestingness problem. We start with an empirical evaluation of objective interestingness ranking criteria (in Chapter 3 based on [SM99]). Interestingness is ultimately subjective, and so we go on to introduce a new approach to subjective interestingness (in Chapter 4, based on [Sah99]). This approach is different from the two prior subjective interestingness approaches in that it uses very simple and very little domain knowledge—that is easy to acquire—to quickly eliminate the majority of the rules that are not interesting. We investigate how this approach can be incorporated into the mining process (in Chapter 5, based on [Sah02b]). To automate much of the tedious manual labor normally involved in interestingness exploration, we introduce a clustering framework for association rules, providing a data-driven grouping and naming scheme for the mined rules (in Chapter 6, based on [Sah02a]). The problem of interestingness is a notoriously difficult one. To reduce the size of the problem we introduce a new type of interestingness criteria, the Impartial Interestingness Criteria, as the first step in of the interestingness framework, the Interestingness PreProcessing Step (in Chapter 7, based on [Sah01]). The impartial interestingness criteria can be applied automatically to the output of any association mining algorithm, independently of its domain, task and users, to eliminate a significant portion of not-interesting rules.

[1]  Alex A. Freitas,et al.  The integrated data mining tool MineKit and a case study of its application on video shop data , 2000 .

[2]  Howard J. Hamilton,et al.  Principles for mining summaries using objective measures of interestingness , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[3]  Wynne Hsu,et al.  Post-Analysis of Learned Rules , 1996, AAAI/IAAI, Vol. 1.

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[6]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[7]  Gregory Piatetsky-Shapiro,et al.  Estimating campaign benefits and modeling lift , 1999, KDD '99.

[8]  Philip S. Yu,et al.  Discovering unexpected information from your competitors' web sites , 2001, KDD '01.

[9]  Einoshin Suzuki,et al.  Discovery of Surprising Exception Rules Based on Intensity of Implication , 1998, PKDD.

[10]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[11]  John F. Roddick,et al.  Higher Order Mining: Modelling And Mining TheResults Of Knowledge Discovery , 2000 .

[12]  Gediminas Adomavicius,et al.  Expert-Driven Validation of Rule-Based User Models in Personalization Applications , 2004, Data Mining and Knowledge Discovery.

[13]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Wynne Hsu,et al.  Using General Impressions to Analyze Discovered Classification Rules , 1997, KDD.

[17]  Sigal Sahar,et al.  Exploring interestingness through clustering: a framework , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  HanJiawei,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998 .

[19]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[20]  Ramesh Subramonian Defining diff as a Data Mining Primitive , 1998, KDD.

[21]  Balaji Padmanabhan,et al.  A Belief-Driven Method for Discovering Unexpected Patterns , 1998, KDD.

[22]  Geoffrey I. Webb Efficient search for association rules , 2000, KDD '00.

[23]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[24]  Alexander Tuzhilin,et al.  A Belief-Driven Discovery Framework Based on Data Monitoring and Triggering , 1996 .

[25]  Tao Luo,et al.  Effective personalization based on association rule discovery from web usage data , 2001, WIDM '01.

[26]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[27]  Gediminas Adomavicius,et al.  Discovery of Actionable Patterns in Databases: the Action Hierarchy Approach , 1997, KDD.

[28]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[29]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[30]  Ron Kohavi,et al.  Mining e-commerce data: the good, the bad, and the ugly , 2001, KDD '01.

[31]  Mika Klemettinen,et al.  A Knowledge Discovery Methodology for Telecommunication Network Alarm Databases , 1999 .

[32]  Abraham Silberschatz,et al.  On Subjective Measures of Interestingness in Knowledge Discovery , 1995, KDD.

[33]  Wynne Hsu,et al.  Identifying non-actionable association rules , 2001, KDD '01.

[34]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[35]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[36]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[37]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[38]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[39]  Sigal Sahar Interestingness preprocessing , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[40]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[41]  Carlos Bento,et al.  A Metric for Selection of the Most Promising Rules , 1998, PKDD.

[42]  Joydeep Ghosh,et al.  Evaluating the novelty of text-mined rules using lexical knowledge , 2001, KDD '01.

[43]  John F. Roddick,et al.  What's interesting about Cricket?: on thresholds and anticipation in discovered rules , 2001, SKDD.

[44]  Gediminas Adomavicius,et al.  User profiling in personalization applications through rule discovery and validation , 1999, KDD '99.

[45]  Jiawei Han,et al.  Mining knowledge at multiple concept levels , 1995, CIKM '95.

[46]  Jian Pei,et al.  Can we push more constraints into frequent pattern mining? , 2000, KDD '00.

[47]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[48]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[49]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[50]  Sigal Sahar,et al.  Interestingness via what is not interesting , 1999, KDD '99.

[51]  Balaji Padmanabhan,et al.  Small is beautiful: discovering the minimal set of unexpected patterns , 2000, KDD '00.

[52]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[53]  Derek J. de Solla Price,et al.  Science Since Babylon , 1961 .

[54]  Howard J. Hamilton,et al.  Evaluation of Interestingness Measures for Ranking Discovered Knowledge , 2001, PAKDD.

[55]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[56]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[57]  Mika Klemettinen,et al.  Applying data mining techniques for descriptive phrase extraction in digital document collections , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[58]  Rajjan Shinghal,et al.  Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[59]  Gregory Piatetsky,et al.  Selecting and Reporting What is Interesting � The KEFIR Application to Healthcare Data , 2004 .

[60]  Jinyan Li,et al.  Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness , 1998, PAKDD.

[61]  Wynne Hsu,et al.  Discovering the set of fundamental rule changes , 2001, KDD '01.

[62]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[63]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[64]  Renée J. Miller,et al.  Association rules over interval data , 1997, SIGMOD '97.

[65]  Philip S. Yu,et al.  A New Approach to Online Generation of Association Rules , 2001, IEEE Trans. Knowl. Data Eng..

[66]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[67]  Philip S. Yu,et al.  Online generation of association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[68]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[69]  Wynne Hsu,et al.  Multi-level organization and summarization of the discovered rules , 2000, KDD '00.

[70]  Burton Egbert Stevenson,et al.  The Macmillan book of proverbs, maxims, and famous phrases , 1965 .

[71]  Heikki Mannila,et al.  Pruning and grouping of discovered association rules , 1995 .

[72]  J. Conacher,et al.  A History of the English-Speaking Peoples. Vol. I: The Birth of Britain , 1956 .

[73]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[74]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[75]  Larry Wall,et al.  Programming Perl , 1991 .

[76]  Pang-Ning Tan,et al.  Interestingness Measures for Association Patterns : A Perspective , 2000, KDD 2000.

[77]  Vikram Pudi,et al.  On the Optimality of Association-rule Mining Algorithms , 2001 .

[78]  Chris Pound In Cyber Space No One can Hear You Scream , 1999, VLDB.

[79]  Jaideep Srivastava,et al.  Discovery of Interesting Usage Patterns from Web Data , 1999, WEBKDD.

[80]  John A. Major,et al.  Selecting among rules induced from a hurricane database , 1993, Journal of Intelligent Information Systems.

[81]  Ulrich Güntzer,et al.  Is pushing constraints deeply into the mining algorithms really what we want?: an alternative approach for association rule mining , 2002, SKDD.

[82]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[83]  Sigal Sahar On incorporating subjective interestingness into the mining process , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[84]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[85]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[86]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[87]  Willi Klösgen,et al.  Problems for knowledge discovery in databases and their treatment in the statistics interpreter explora , 1992, Int. J. Intell. Syst..

[88]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[89]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[90]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[91]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[92]  Laks V. S. Lakshmanan,et al.  Interestingness and Pruning of Mined Patterns , 1999, 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[93]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[94]  Howard J. Hamilton,et al.  Extracting Share Frequent Itemsets with Infrequent Subsets , 2003, Data Mining and Knowledge Discovery.

[95]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[96]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[97]  Gediminas Adomavicius,et al.  Handling very large numbers of association rules in the analysis of microarray data , 2002, KDD.

[98]  Christian Hidber,et al.  Association Rule Mining , 2017 .