Are we really discovering ''interesting'' knowledge from data?

This paper is a critical review of the literature on discovering comprehensible, interesting knowledge (or patterns) from data. The motivation for this review is that the majority of the literature focuses only on the problem of maximizing the accuracy of the discovered patterns, ignoring other important pattern-quality criteria that are user-oriented, such as comprehensibility and interestingness. The word “interesting” has been used with several different meanings in the data mining literature. In this paper interesting essentially means novel or surprising. Although comprehensibility and interestingness are considerably harder to measure in a formal way than accuracy, they seem very relevant criteria to be considered if we are serious about discovering knowledge that is not only accurate, but also useful for human decision making. The paper discusses both data-driven methods (based mainly on statistical properties of the patterns) and user-driven methods (which take into account the user’s background knowledge or believes) for discovering interesting knowledge. Data-driven methods are discussed in more detail because they are more common in the literature and are more controversial. The paper also suggests future research directions in the discovery of interesting knowledge.

[1]  Alex Alves Freitas,et al.  Discovering Surprising Instances of Simpson's Paradox in Hierarchical Multidimensional Data , 2006, Int. J. Data Warehous. Min..

[2]  Kwong-Sak Leung,et al.  Data Mining Using Grammar Based Genetic Programming and Applications , 2000 .

[3]  Shusaku Tsumoto,et al.  Evaluating a rule evaluation support method with learning models based on objective rule evaluation indices - a case study with a meningitis data mining result , 2005, Fifth International Conference on Hybrid Intelligent Systems (HIS'05).

[4]  Wynne Hsu,et al.  Finding Interesting Patterns Using User Expectations , 1999, IEEE Trans. Knowl. Data Eng..

[5]  Kenneth McGarry,et al.  A survey of interestingness measures for knowledge discovery , 2005, The Knowledge Engineering Review.

[6]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[7]  Shusaku Tsumoto,et al.  Clinical Knowledge Discovery in Hospital Information Systems: Two Case Studies , 2000, PKDD.

[8]  Wynne Hsu,et al.  Using General Impressions to Analyze Discovered Classification Rules , 1997, KDD.

[9]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[10]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[11]  Einoshin Suzuki,et al.  Discovery of Surprising Exception Rules Based on Intensity of Implication , 1998, PKDD.

[12]  Ron Kohavi Focus the Mining Beacon: Lessons and Challenges from the World of E-Commerce , 2005, PKDD.

[13]  Alex A. Freitas,et al.  Discovering Surprising Patterns by Detecting Occurrences of Simpson’s Paradox , 2000 .

[14]  Takahira Yamaguchi,et al.  Evaluation of Rule Interestingness Measures with a Clinical Dataset on Hepatitis , 2004, PKDD.

[15]  Shusaku Tsumoto,et al.  A rule evaluation support method with learning models based on objective rule evaluation indexes , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[16]  Deborah R. Carvalho,et al.  Evaluating the Correlation Between Objective Rule Interestingness Measures and Real Human Interest , 2005, PKDD.

[17]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[18]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[19]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[20]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[21]  Joydeep Ghosh,et al.  Evaluating the novelty of text-mined rules using lexical knowledge , 2001, KDD '01.

[22]  Michael J. Pazzani,et al.  Knowledge discovery from data? , 2000, IEEE Intell. Syst..

[23]  Einoshin Suzuki,et al.  Discovering Interesting Exception Rules with Rule Pair , 2004 .

[24]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[25]  Alex Alves Freitas,et al.  Discovering interesting knowledge from a science and technology database with a genetic algorithm , 2004, Appl. Soft Comput..