Discovering knowledge from noisy databases using genetic programming

In data mining, we emphasize the need for learning from huge, incomplete, and imperfect data sets. To handle noise in the problem domain, existing learning systems avoid overfitting the imperfect training examples by excluding insignificant patterns. The problem is that these systems use a limiting attribute‐value language for representing the training examples and the induced knowledge. Moreover, some important patterns are ignored because they are statistically insignificant. In this article, we present a framework that combines Genetic Programming and Inductive Logic Programming to induce knowledge represented in various knowledge representation formalisms from noisy databases. The framework is based on a formalism of logic grammars, and it can specify the search space declaratively. An implementation of the framework, LOGENPRO (The Logic grammar based GENetic PROgramming system), has been developed. The performance of LOGENPRO is evaluated on the chess end‐game domain. We compare LOGENPRO with FOIL and other learning systems in detail, and find its performance is significantly better than that of the others. This result indicates that the Darwinian principle of natural selection is a plausible noise handling method that can avoid overfitting and identify important patterns at the same time. Moreover, the system is applied to one real‐life medical database. The knowledge discovered provides insights to and allows better understanding of the medical domains.

[1]  Lawrence Davis,et al.  Genetic Algorithms and Simulated Annealing , 1987 .

[2]  Saso Dzeroski,et al.  Inductive Learning in Deductive Databases , 1993, IEEE Trans. Knowl. Data Eng..

[3]  Saso Dzeroski,et al.  Inductive Logic Programming and Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[5]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[6]  Verónica Dahl,et al.  Logic Grammars , 1989, Symbolic Computation.

[7]  Stuart M. Shieber,et al.  Prolog and Natural-Language Analysis , 1987 .

[8]  J. Ross Quinlan,et al.  Determinate Literals in Inductive Logic Programming , 1991, IJCAI.

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  Christos H. Papadimitriou,et al.  Elements of the Theory of Computation , 1997, SIGA.

[11]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[12]  Stephen Muggleton,et al.  An Experimental Comparison of Human and Machine Learning Formalisms , 1989, ML.

[13]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[14]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[15]  Kwong-Sak Leung,et al.  An induction system that learns programs in different programming languages using genetic programming and logic grammars , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[16]  David H. D. Warren,et al.  Definite Clause Grammars for Language Analysis - A Survey of the Formalism and a Comparison with Augmented Transition Networks , 1980, Artif. Intell..

[17]  Leon Sterling,et al.  The Art of Prolog , 1987, IEEE Expert.

[18]  J. K. Kinnear,et al.  Advances in Genetic Programming , 1994 .

[19]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[20]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[21]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[22]  Thomas Ellman,et al.  Explanation-based learning: a survey of programs and perspectives , 1989, CSUR.

[23]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[24]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[25]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .