Extracting refined rules from knowledge-based neural networks

Neural networks, despite their empirically proven abilities, have been little used for the refinement of existing knowledge because this task requires a three-step process. First, knowledge must be inserted into a neural network. Second, the network must be refined. Third, the refined knowledge must be extracted from the network. We have previously described a method for the first step of this process. Standard neural learning techniques can accomplish the second step. In this article, we propose and empirically evaluate a method for the final, and possibly most difficult, step. Our method efficiently extracts symbolic rules from trained neural networks. The four major results of empirical tests of this method are that the extracted rules 1) closely reproduce the accuracy of the network from which they are extracted; 2) are superior to the rules produced by methods that directly refine symbolic rules; 3) are superior to those produced by previous techniques for extracting rules from trained neural networks; and 4) are “human comprehensible.” Thus, this method demonstrates that neural networks can be used to effectively refine symbolic knowledge. Moreover, the rule-extraction technique developed herein contributes to the understanding of how symbolic and connectionist approaches to artificial intelligence can be profitably integrated.

[1]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[2]  William M. Smith,et al.  A Study of Thinking , 1956 .

[3]  U. Neisser,et al.  Hierarchies in concept attainment. , 1962, Journal of experimental psychology.

[4]  John P. McDermott,et al.  R1: A Rule-Based Configurer of Computer Systems , 1982, Artif. Intell..

[5]  D. K. Hawley,et al.  Compilation and analysis of Escherichia coli promoter DNA sequences. , 1983, Nucleic acids research.

[6]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[7]  S. Harrison,et al.  Effect of non-contacted bases on the affinity of 434 operator for 434 repressor and Cro , 1987, Nature.

[8]  C. Harley,et al.  Analysis of E. coli promoter sequences. , 1987, Nucleic acids research.

[9]  R. Nakano,et al.  Medical diagnostic expert system based on PDP model , 1988, IEEE 1988 International Conference on Neural Networks.

[10]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[11]  J. Stephen Judd,et al.  On the complexity of loading shallow neural networks , 1988, J. Complex..

[12]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[13]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[14]  Douglas H. Fisher,et al.  An Empirical Comparison of ID3 and Back-propagation , 1989, IJCAI.

[15]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[16]  Yoichi Hayashi,et al.  A Neural Expert System with Automated Extraction of Fuzzy If-Then Rules , 1990, NIPS.

[17]  Jude W. Shavlik,et al.  Training Knowledge-Based Neural Networks to Recognize Genes , 1990, NIPS.

[18]  Raymond J. Mooney,et al.  Changing the Rules: A Comprehensive Approach to Theory Refinement , 1990, AAAI.

[19]  G. Stormo Consensus patterns in DNA. , 1990, Methods in enzymology.

[20]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[21]  Jack Mostow,et al.  Direct Transfer of Learned Information Among Neural Networks , 1991, AAAI.

[22]  Pat Langley,et al.  Using Background Knowledge in Concept Formation , 1991, ML.

[23]  Jude W. Shavlik,et al.  Interpretation of Artificial Neural Networks: Mapping Knowledge-Based Neural Networks into Rules , 1991, NIPS.

[24]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[25]  S. Džeroski,et al.  Learning Relations from Noisy Examples: An Empirical Comparison of LINUS and FOIL , 1991, ML.

[26]  Michael C. Mozer,et al.  The Connectionist Scientist Game: Rule Extraction and Refinement in a Neural Network , 1991 .

[27]  M. Pazzani,et al.  ID2-of-3: Constructive Induction of M-of-N Concepts for Discriminators in Decision Trees , 1991 .

[28]  D. Ourston Using explanation-based and empirical methods in theory revision , 1991 .

[29]  Hamid R. Berenji,et al.  Refinement of Approximate Reasoning-based Controllers by Reinforcement Learning , 1991, ML.

[30]  LiMin Fu,et al.  Rule Learning by Searching on Adapted Nets , 1991, AAAI.

[31]  Geoffrey G. Towell,et al.  Symbolic knowledge and neural networks: insertion, refinement and extraction , 1992 .

[32]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[33]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[34]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[35]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[36]  CA. When Prior Knowledge Hinders Learning , .