We explore the role of entropy manipulation during learning in supervised multilayer perceptron classifiers. We show that for a 2-layer MLP classifier, conditional entropy minimization in the internal layer is a necessary condition for error minimization in the mapping from the input to the output. The relationship between entropy and the expected volume and mass of a convex hull constructed from n sample points is examined. We show that minimizing the expected hull volume may have more desirable gradient dynamics when compared to minimizing entropy. We show that entropy by itself has some geometrical invariance with respect to expected hull volumes. We develop closed form expressions for the expected convex hull mass and volumes in R/sup 1/ and relate these to error probability. Finally, we show that learning in an MLP may be accomplished solely by minimization of the conditional expected hull volumes and the expected volume of the "intensity of collision".
[1]
Harold Szu.
Hairy neuron convergence theorems without the precision of timing
,
1990,
1990 IJCNN International Joint Conference on Neural Networks.
[2]
R. Hartley.
Transmission of information
,
1928
.
[3]
T. Ens,et al.
Blind signal separation : statistical principles
,
1998
.
[4]
Jean-Francois Cardoso,et al.
Blind signal separation: statistical principles
,
1998,
Proc. IEEE.
[5]
Terrence J. Sejnowski,et al.
An Information-Maximization Approach to Blind Separation and Blind Deconvolution
,
1995,
Neural Computation.
[6]
Ralph Linsker,et al.
How to Generate Ordered Maps by Maximizing the Mutual Information between Input and Output Signals
,
1989,
Neural Computation.