Knowledge Discovery and Interestingness Measures: A Survey

Knowledge discovery in databases, also known as data mining, is the efficient discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. An important problem in the area of data mining is the development of effective measures of interestingness for ranking the discovered knowledge. In this report, we provide a general overview of the more successful and widely known data mining techniques and algorithms, and survey seventeen interestingness measures from the literature that have been successfully employed in data mining applications.

[1]  Ryszard S. Michalski,et al.  An Integration of Rule Induction and Exemplar-Based Learning for Graded Concepts , 1995, Machine Learning.

[2]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[3]  Haym Hirsh,et al.  Learning to Predict Rare Events in Event Sequences , 1998, KDD.

[4]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[5]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[6]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[7]  Stephen E. Fienberg,et al.  The analysis of cross-classified categorical data , 1980 .

[8]  Carlos Bento,et al.  A Metric for Selection of the Most Promising Rules , 1998, PKDD.

[9]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[10]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[11]  Jaideep Srivastava,et al.  Pattern Directed Mining of Sequence Data , 1998, KDD.

[12]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[13]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[14]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[15]  Howard J. Hamilton,et al.  ESTIMATING DBLEARN'S POTENTIAL FOR KNOWLEDGE DISCOVERY IN DATABASES , 1995, Comput. Intell..

[16]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[17]  Usama M. Fayyad,et al.  Knowledge Discovery in Databases: An Overview , 1997, ILP.

[18]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[19]  Rajjan Shinghal,et al.  Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  Padhraic Smyth,et al.  Rule Induction Using Information Theory , 1991, Knowledge Discovery in Databases.

[22]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[23]  Gregory Piatetsky,et al.  Selecting and Reporting What is Interesting � The KEFIR Application to Healthcare Data , 2004 .

[24]  Jan M. Zytkow,et al.  Discovering Enrollment Knowledge in University Databases , 1995, KDD.

[25]  Jinyan Li,et al.  Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness , 1998, PAKDD.

[26]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[27]  Balaji Padmanabhan,et al.  Pattern Discovery in Temporal Databases: A Temporal Logic Approach , 1996, KDD.

[28]  John A. Major,et al.  Selecting among rules induced from a hurricane database , 1993, Journal of Intelligent Information Systems.

[29]  Tom Michael Mitchell Version spaces: an approach to concept learning. , 1979 .

[30]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[31]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[32]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[33]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[34]  William A. Wallace,et al.  Are we losing accuracy while gaining confidence in induced rules - an assessment of PrIL , 1995, KDD 1995.

[35]  Yiyu Yao,et al.  Peculiarity Oriented Multi-database Mining , 1999, PKDD.

[36]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[37]  Nick Cercone,et al.  Mining Market Basket Data Using Share Measures and Characterized Itemsets , 1998, PAKDD.

[38]  Mohammed J. Zaki,et al.  PlanMine: Sequence Mining for Plan Failures , 1998, KDD.

[39]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[40]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[41]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[42]  Jan M. Zytkow,et al.  From Contingency Tables to Various Forms of Knowledge in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[43]  Balaji Padmanabhan,et al.  A Belief-Driven Method for Discovering Unexpected Patterns , 1998, KDD.

[44]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[45]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[46]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[47]  H. T. Reynolds,et al.  The analysis of cross-classifications , 1977 .

[48]  Rüdiger Wirth,et al.  Discovery of Association Rules over Ordinal Data: A New and Faster Algorithm and Its Application to Basket Analysis , 1998, PAKDD.

[49]  J. E. Jackson The Analysis of Cross-Classified Data Having Ordered Categories , 1986 .

[50]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[51]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[52]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[53]  Sanjay Ranka,et al.  CLOUDS: A Decision Tree Classifier for Large Datasets , 1998, KDD.

[54]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[55]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[56]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[57]  Alex Alves Freitas,et al.  On Objective Measures of Rule Surprisingness , 1998, PKDD.

[58]  Hannu T. T. Toivonen,et al.  Samplinglarge databases for finding association rules , 1996, VLDB 1996.

[59]  Abraham Silberschatz,et al.  On Subjective Measures of Interestingness in Knowledge Discovery , 1995, KDD.

[60]  Hongjun Lu,et al.  Efficient Search of Reliable Exceptions , 1999, PAKDD.

[61]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[62]  Howard J. Hamilton,et al.  Machine Learning of Credible Classifications , 1997, Australian Joint Conference on Artificial Intelligence.

[63]  Jiawei Han,et al.  Mining Segment-Wise Periodic Patterns in Time-Related Databases , 1998, KDD.

[64]  Wynne Hsu,et al.  Using General Impressions to Analyze Discovered Classification Rules , 1997, KDD.

[65]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[66]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[67]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[68]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[69]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[70]  R. Bharat Rao,et al.  Time Series Forecasting from High-Dimensional Data with Multiple Adaptive Layers , 1998, KDD.