Representation and generalization properties of class-entropy networks

Using conditional class entropy (CCE) as a cost function allows feedforward networks to fully exploit classification-relevant information. CCE-based networks arrange the data space into partitions, which are assigned unambiguous symbols and are labeled by class information. By this labeling mechanism the network can model the empirical data distribution at the local level. Region labeling evolves with the network-training process, which follows a plastic algorithm. The paper proves several theoretical properties about the performance of CCE-based networks, and considers both convergence during training and generalization ability at run-time. In addition, analytical criteria and practical procedures are proposed to enhance the generalization performance of the trained networks. Experiments on artificial and real-world domains confirm the accuracy of this class of networks and witness the validity of the described methods.

[1]  Markus Höhfeld,et al.  Learning with limited numerical precision using the cascade-correlation algorithm , 1992, IEEE Trans. Neural Networks.

[2]  Sandro Ridella,et al.  Minimizing multimodal functions of continuous variables with the “simulated annealing” algorithmCorrigenda for this article is available here , 1987, TOMS.

[3]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.

[4]  Jean-Pierre Nadal,et al.  Study of a Growth Algorithm for a Feedforward Network , 1989, Int. J. Neural Syst..

[5]  K. Lang,et al.  Learning to tell two spirals apart , 1988 .

[6]  Marco Gori,et al.  Optimal convergence of on-line backpropagation , 1996, IEEE Trans. Neural Networks.

[7]  Kiyotoshi Matsuoka,et al.  Noise injection into inputs in back-propagation learning , 1992, IEEE Trans. Syst. Man Cybern..

[8]  Peter Seitz,et al.  Minimum class entropy: A maximum information approach to layered networks , 1989, Neural Networks.

[9]  Michel Verleysen,et al.  Enhanced learning for evolutive neural architectures , 1995 .

[10]  M. Golea,et al.  A Growth Algorithm for Neural Network Decision Trees , 1990 .

[11]  M. Golea,et al.  A Convergence Theorem for Sequential Learning in Two-Layer Perceptrons , 1990 .

[12]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[13]  Norio Baba,et al.  A new approach for finding the global minimum of error function of neural networks , 1989, Neural Networks.

[14]  Stephen I. Gallant,et al.  Neural network learning and expert systems , 1993 .

[15]  Franco Scarselli,et al.  Are Multilayer Perceptrons Adequate for Pattern Recognition and Verification? , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Mario Marchand,et al.  Learning by Minimizing Resources in Neural Networks , 1989, Complex Syst..

[17]  Jean-Pierre Nadal,et al.  Neural trees: a new tool for classification , 1990 .

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[20]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[21]  Asim Roy,et al.  An algorithm to generate radial basis function (RBF)-like nets for classification problems , 1995, Neural Networks.

[22]  Yih-Fang Huang,et al.  Bounds on the number of hidden neurons in multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[23]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[24]  Sandro Ridella,et al.  Circular backpropagation networks for classification , 1997, IEEE Trans. Neural Networks.

[25]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[26]  Louis ten Bosch,et al.  Speaker normalization for automatic speech recognition — An on-line approach , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[27]  G. Barkema,et al.  A Fast Partitioning Algorithm and a Comparison of Binary Feedforward Neural Networks , 1992 .

[28]  Nicolaos B. Karayiannis,et al.  Growing radial basis neural networks: merging supervised and unsupervised learning with network growth techniques , 1997, IEEE Trans. Neural Networks.

[29]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[30]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[32]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[33]  Separable Regions On Hidden Nodes for Neural Nets , 1989 .