Class-Entropy minimisation networks for domain analysis and rule extraction

When applied to supervised classification problems, neural rule extraction aims at making classification mechanisms explicit by reversing the knowledge embedded in a network's connections. To this end, the present research has borrowed notions from information theory; using Conditional Class Entropy as a network cost function improves representation efficiency by forcing the network to emphasize only task-essential information. We present a library of methods to analyse, simplify and rearrange the knowledge embedded in CCE-trained networks; the final result is a hierarchy of if-then rules modelling the classification process in symbolic form. Experimental results on a clinical testbed (diagnosis of Lyme disease) confirmed the effectiveness and the generalisation power of the methodologies described. In addition, satisfactory results obtained on this still unsolved clinical problem endowed this research with interdisciplinary value.

[1]  Sandro Ridella,et al.  Minimizing multimodal functions of continuous variables with the “simulated annealing” algorithmCorrigenda for this article is available here , 1987, TOMS.

[2]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[3]  Peter Seitz,et al.  Minimum class entropy: A maximum information approach to layered networks , 1989, Neural Networks.

[4]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[5]  Norio Baba,et al.  A new approach for finding the global minimum of error function of neural networks , 1989, Neural Networks.

[6]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[7]  D. Rumelhart,et al.  Generalization by weight-elimination applied to currency exchange rate prediction , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[8]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[9]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[10]  M Barinaga Furor at Lyme disease conference. , 1992, Science.

[11]  Rodolfo Zunino,et al.  Automated diagnosis and disease characterization using neural network analysis , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.

[12]  Thomas C. Bartee,et al.  Theory And Design Of Digital Machines , 1964 .

[13]  R. Nakano,et al.  Medical diagnostic expert system based on PDP model , 1988, IEEE 1988 International Conference on Neural Networks.

[14]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[15]  S. Accardo,et al.  The epidemiology of Lyme borreliosis in Italy. , 1992, Microbiologica.

[16]  Shigeo Abe,et al.  Extracting algorithms from pattern classification neural networks , 1993, Neural Networks.

[17]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[18]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[19]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[20]  J. P. Davis,et al.  Lyme disease: A tick-borne spirochetosis , 1983 .

[21]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[22]  H J Moens,et al.  Computer-assisted diagnosis of rheumatic disorders. , 1991, Seminars in arthritis and rheumatism.

[23]  W. Burgdorfer,et al.  Lyme disease-a tick-borne spirochetosis? , 1983, Science.

[24]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[25]  Jude W. Shavlik,et al.  Interpretation of Artificial Neural Networks: Mapping Knowledge-Based Neural Networks into Rules , 1991, NIPS.