A framework for autonomous knowledge discovery from databases

This research evaluates the sufficiency of an agenda- and justification-based framework for fully autonomous discovery systems. The proposed framework provides a reasoning component for the autonomous selection of discovery tasks which ranks the tasks by their plausibility. A task's plausibility is computed from the interestingness of the items involved in the tasks and the strengths of justifications given for performing them. Heuristics are used to perform tasks and to propose new tasks. In addition, the framework is extremely modular, facilitating the extension of a discovery system. The framework's sufficiency was demonstrated by implementing it in a prototype system called HAMB and using it to make discoveries from the domain of experimental conditions that favor the growth of crystals of DNA-protein complexes and proteins for X-ray crystallographic studies. Details of the prototype's implementation, such as its interestingness function and its heuristics for performing, proposing, and justifying tasks, are reviewed, and results of evaluations of the framework and HAMB are presented and discussed.

[1]  R. Bone Discovery , 1938, Nature.

[2]  Bruce G. Buchanan,et al.  Heuristic DENDRAL - A program for generating explanatory hypotheses in organic chemistry. , 1968 .

[3]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[4]  Douglas B. Lenat,et al.  AM, an artificial intelligence approach to discovery in mathematics as heuristic search , 1976 .

[5]  Tom Michael Mitchell,et al.  Model-directed learning of production rules , 1977, SGAR.

[6]  Douglas B. Lenat,et al.  The ubiquity of discovery , 1993, AFIPS National Computer Conference.

[7]  Bruce G. Buchanan,et al.  Mechanizing the Search for Explanatory Hypotheses , 1982, PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association.

[8]  Douglas B. Lenat,et al.  Why AM and EURISKO Appear to Work , 1984, Artif. Intell..

[9]  Douglas B. Lenat,et al.  The Role of Heuristics in Learning by Discovery: Three Case Studies , 1983 .

[10]  F. K. Hanna,et al.  AM: A Case Study in AI Methodology , 1984, Artif. Intell..

[11]  Tom M. Mitchell,et al.  Representation and Use of Explicit Justifications for Knowledge Base Refinements , 1985, IJCAI.

[12]  A. Kitchen,et al.  Knowledge based systems in artificial intelligence , 1985, Proceedings of the IEEE.

[13]  Douglas H. Fisher,et al.  Conceptual Clustering, Learning from Examples, and Inference , 1987 .

[14]  M. Sims Empirical and Analytic Discovery in IL , 1987 .

[15]  Wei-Min Shen,et al.  Functional transformations in AI discovery systems , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume III: Decision Support and Knowledge Based Systems Track.

[16]  G. Gilliland A biological macromolecule crystallization database: A basis for a crystallization strategy , 1988 .

[17]  Foster J. Provost,et al.  RL4: a tool for knowledge-based induction , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[18]  S. Clearwater,et al.  A rule-learning program in high energy physics event classification , 1991 .

[19]  John Foster Provost,et al.  Policies for the selection of bias in inductive machine learning , 1992 .

[20]  Foster J. Provost,et al.  Inductive Policy , 1992, AAAI.

[21]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[22]  Usama M. Fayyad,et al.  SKICAT: A Machine Learning System for Automated Cataloging of Large Scale Sky Surveys , 1993, ICML.

[23]  Foster J. Provost,et al.  Small Disjuncts in Action: Learning to Diagnose Errors in the Local Loop of the Telephone Network , 1993, ICML.

[24]  Devika Subramanian,et al.  Induction of Rules for Biological Macromolecular Crystallization , 1994, ISMB.

[25]  Ron Kohavi,et al.  Automatic Parameter Selection by Minimizing Estimated Error , 1995, ICML.

[26]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[27]  Ning Zhong,et al.  Toward a Multi-Strategy and Cooperative Discovery System , 1995, KDD.

[28]  Abraham Silberschatz,et al.  On Subjective Measures of Interestingness in Knowledge Discovery , 1995, KDD.

[29]  Paul R. Cohen,et al.  A Mixed-Initiative Planning Approach to Exploratory Data Analysis , 1996 .

[30]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[31]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[32]  Ronald J. Brachman,et al.  The Process of Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[33]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[34]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[35]  Ramasamy Uthurusamy,et al.  From Data Mining to Knowledge Discovery: Current Challenges and Future Directions , 1996, Advances in Knowledge Discovery and Data Mining.

[36]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[37]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[38]  H. Rosenkranz,et al.  Carcinogenicity predictions for a group of 30 chemicals undergoing rodent cancer bioassays based on rules derived from subchronic organ toxicities. , 1996, Environmental health perspectives.

[39]  Carla E. Brodley,et al.  Identifying and Eliminating Mislabeled Training Instances , 1996, AAAI/IAAI, Vol. 1.

[40]  Yongwon Lee Learning a robust rule set , 1996 .

[41]  Robert Engels,et al.  Planning Tasks for Knowledge Discovery in Databases; Performing Task-Oriented User-Guidance , 1996, KDD.

[42]  Wei-Min Shen,et al.  A Metapattern-Based Automated Discovery Loop for Integrated Data Mining - Unsupervised Learning of Relational Patterns , 1996, IEEE Trans. Knowl. Data Eng..

[43]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[44]  Constantin F. Aliferis,et al.  An evaluation of machine-learning methods for predicting pneumonia mortality , 1997, Artif. Intell. Medicine.

[45]  Paul R. Cohen,et al.  Interaction with a mixed-initiative system for exploratory data analysis , 1997, IUI '97.

[46]  Yoshitsugu Kakemoto,et al.  KDD Process Planning , 1997, KDD.

[47]  Paul R. Cohen,et al.  Evaluation of a semi-autonomous assistant for exploratory data analysis , 1997, AGENTS '97.

[48]  Paul R. Cohen,et al.  Intelligent Support for Exploratory Data Analysis , 1998 .

[49]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[50]  Russ B. Altman,et al.  Model Formulation: Automated Diagnosis of Data-Model Conflicts Using Metadata , 1999, J. Am. Medical Informatics Assoc..

[51]  Toby Walsh,et al.  Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000 , 2000, ICML.

[52]  D Hennessy,et al.  Statistical methods for the objective design of screening procedures for macromolecular crystallization. , 2000, Acta crystallographica. Section D, Biological crystallography.