Knowledge discovery from data?

The knowledge discovery and data mining (KDD) field draws on findings from statistics, databases, and artificial intelligence to construct tools that let users gain insight from massive data sets. People in business, science, medicine, academia, and government collect such data sets, and several commercial packages now offer general-purpose KDD tools. An important KDD goal is to "turn data into knowledge". For example, knowledge acquired through such methods on a medical database could be published in a medical journal. Knowledge acquired from analyzing a financial or marketing database could revise business practice and influence a management school's curriculum. In addition, some US laws require reasons for rejecting a loan application, which knowledge from the KDD could provide. Occasionally, however, you must explain the learned decision criteria to a court, as in the recent lawsuit Blue Mountain filed against Microsoft for a mail filter that classified electronic greeting cards as spam mail. We expect more from knowledge discovery tools than simply creating accurate models as in machine learning, statistics, and pattern recognition. We can fully realize the benefits of data mining by paying attention to the cognitive factors that make the resulting models coherent, credible, easy to use, and easy to communicate to others.

[1]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[2]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[3]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  Michael J. Pazzani,et al.  Comprehensible Knowledge-Discovery in Databases , 1997 .

[5]  David Heckerman,et al.  Bayesian Networks for Knowledge Discovery , 1996, Advances in Knowledge Discovery and Data Mining.

[6]  Marc M. Sebrechts,et al.  Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces , 1999, SIGIR '99.

[7]  Brian D. Ripley,et al.  Statistical aspects of neural networks , 1993 .

[8]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[9]  M. Pazzani Influence of prior knowledge on concept acquisition: Experimental and computational results. , 1991 .

[10]  Douglas H. Fisher,et al.  Overcoming process delays with decision tree induction , 1994, IEEE Expert.

[11]  Pat Langley,et al.  Induction of Condensed Determinations , 1996, KDD.

[12]  Brian R. Gaines,et al.  Transforming Rules and Trees into Comprehensible Knowledge Structures , 2000 .

[13]  Michael J. Pazzani,et al.  Representation of electronic mail filtering profiles: a user study , 2000, IUI '00.

[14]  M. Pazzani Influence of prior knowledge on concept acquisition: Experimental and computational results. , 1991 .

[15]  Richard N. Shiffman,et al.  Model Formulation: Representation of Clinical Practice Guidelines in Conventional and Augmented Decision Tables , 1997, J. Am. Medical Informatics Assoc..

[16]  D. Billman Structural Biases in Concept Learning: Influences from Multiple Functions , 1996 .

[17]  Ron Kohavi,et al.  MineSet: An Integrated System for Data Mining , 1997, KDD.

[18]  Ashwin Srinivasan,et al.  Carcinogenesis Predictions Using ILP , 1997, ILP.