Guiding Hidden Layer Representations for Improved Rule Extraction From Neural Networks

The production of relatively large and opaque weight matrices by error backpropagation learning has inspired substantial research on how to extract symbolic human-readable rules from trained networks. While considerable progress has been made, the results at present are still relatively limited, in part due to the large numbers of symbolic rules that can be generated. Most past work to address this issue has focused on progressively more powerful methods for rule extraction (RE) that try to minimize the number of weights and/or improve rule expressiveness. In contrast, here we take a different approach in which we modify the error backpropagation training process so that it learns a different hidden layer representation of input patterns than would normally occur. Using five publicly available datasets, we show via computational experiments that the modified learning method helps to extract fewer rules without increasing individual rule complexity and without decreasing classification accuracy. We conclude that modifying error backpropagation so that it more effectively separates learned pattern encodings in the hidden layer is an effective way to improve contemporary RE methods.

[1]  Wlodzislaw Duch,et al.  A new methodology of extraction, optimization and application of crisp and fuzzy logical rules , 2001, IEEE Trans. Neural Networks.

[2]  Zongben Xu,et al.  When Does Online BP Training Converge? , 2009, IEEE Transactions on Neural Networks.

[3]  James A. Reggia,et al.  Improving rule extraction from neural networks by modifying hidden layer representations , 2009, 2009 International Joint Conference on Neural Networks.

[4]  Jude W. Shavlik,et al.  Extracting refined rules from knowledge-based neural networks , 2004, Machine Learning.

[5]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[6]  Alan Tickle,et al.  Clinical applications of artificial neural networks: A review of techniques for extracting rules from trained artificial neural networks , 2001 .

[7]  Hao Yu,et al.  Improved Computation for Levenberg–Marquardt Training , 2010, IEEE Transactions on Neural Networks.

[8]  Jacek M. Zurada,et al.  Extraction of rules from artificial neural networks for nonlinear regression , 2002, IEEE Trans. Neural Networks.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[11]  Bart Baesens,et al.  Minerva: Sequential Covering for Rule Extraction , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[13]  Rudy Setiono,et al.  Extracting -of- Rules from Trained Neural Networks , 2000 .

[14]  A. Menezes,et al.  This report was prepared by , 2004 .

[15]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[16]  Sungzoon Cho,et al.  Learning Competition and Cooperation , 1993, Neural Computation.

[17]  Paulo J. G. Lisboa,et al.  Orthogonal search-based rule extraction (OSRE) for trained neural networks: a practical and efficient approach , 2006, IEEE Transactions on Neural Networks.

[18]  Henrik Jacobsson,et al.  Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[19]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[20]  Richi Nayak Generating rules with predicates, terms and variables from the pruned neural networks , 2009, Neural Networks.

[21]  Bart Baesens,et al.  Recursive Neural Network Rule Extraction for Data With Mixed Attributes , 2008, IEEE Transactions on Neural Networks.

[22]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[23]  Jeremy Foster,et al.  Understanding and Using Advanced Statistics , 2005 .

[24]  Sebastian Bader,et al.  Extracting Propositional Rules from Feed-forward Neural Networks - A New Decompositional Approach , 2007, NeSy.

[25]  T.,et al.  Training Feedforward Networks with the Marquardt Algorithm , 2004 .

[26]  Huan Liu,et al.  NeuroLinear: From neural networks to oblique decision rules , 1997, Neurocomputing.

[27]  Hongjun Lu,et al.  NeuroRule: A Connectionist Approach to Data Mining , 1995, VLDB.

[28]  Urbano Nunes,et al.  Novel Maximum-Margin Training Algorithms for Supervised Neural Networks , 2010, IEEE Transactions on Neural Networks.

[29]  Martin A. Riedmiller,et al.  RPROP - A Fast Adaptive Learning Algorithm , 1992 .

[30]  R. Setiono,et al.  Effective neural network pruning using cross-validation , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..