On self-organizing algorithms and networks for class-separability features

We describe self-organizing learning algorithms and associated neural networks to extract features that are effective for preserving class separability. As a first step, an adaptive algorithm for the computation of Q(-1/2) (where Q is the correlation or covariance matrix of a random vector sequence) is described. Convergence of this algorithm with probability one is proven by using stochastic approximation theory, and a single-layer linear network architecture for this algorithm is described, which we call the Q(-1/2) network. Using this network, we describe feature extraction architectures for: 1) unimodal and multicluster Gaussian data in the multiclass case; 2) multivariate linear discriminant analysis (LDA) in the multiclass case; and 3) Bhattacharyya distance measure for the two-class case. The LDA and Bhattacharyya distance features are extracted by concatenating the Q (-1/2) network with a principal component analysis network, and the two-layer network is proven to converge with probability one. Every network discussed in the study considers a flow or sequence of inputs for training. Numerical studies on the performance of the networks for multiclass random data are presented.

[1]  José Carlos Goulart de Siqueira,et al.  Differential Equations , 1919, Nature.

[2]  Anil K. Jain,et al.  Artificial neural network for nonlinear projection of multivariate data , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[3]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[4]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[5]  James C. Bezdek,et al.  Generalized clustering networks and Kohonen's self-organizing scheme , 1993, IEEE Trans. Neural Networks.

[6]  John W. Sammon,et al.  An Optimal Set of Discriminant Vectors , 1975, IEEE Transactions on Computers.

[7]  J. C. Burkill,et al.  Ordinary Differential Equations , 1964 .

[8]  B. Moore,et al.  ART1 and pattern clustering , 1989 .

[9]  A. Zygmund,et al.  Measure and integral : an introduction to real analysis , 1977 .

[10]  P. GALLINARI,et al.  On the relations between discriminant analysis and multilayer perceptrons , 1991, Neural Networks.

[11]  Sun-Yuan Kung,et al.  A neural network learning algorithm for adaptive principal component extraction (APEX) , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12]  佐藤 保,et al.  Principal Components , 2021, Encyclopedic Dictionary of Archaeology.

[13]  Eric A. Wan,et al.  Neural network classification: a Bayesian interpretation , 1990, IEEE Trans. Neural Networks.

[14]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[15]  Keinosuke Fukunaga,et al.  Systematic Feature Extraction , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[17]  G. Pflug,et al.  Stochastic approximation and optimization of random systems , 1992 .

[18]  Michael T. Manry,et al.  Iterative improvement of a Gaussian classifier , 1990, Neural Networks.

[19]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[20]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[21]  P. Foldiak,et al.  Adaptive network for optimal linear feature extraction , 1989, International 1989 Joint Conference on Neural Networks.

[22]  Shingo Tomita,et al.  An optimal orthonormal system for discriminant analysis , 1985, Pattern Recognit..

[23]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[24]  B. Anderson,et al.  Optimal control: linear quadratic methods , 1990 .

[25]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[26]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[27]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[28]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[29]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[30]  J. Rubner,et al.  Development of feature detectors by self-organization. A network model. , 1990, Biological cybernetics.

[31]  John,et al.  On Comprehensive Visual Learning , 1994 .

[32]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[33]  J. Rubner,et al.  A Self-Organizing Network for Principal-Component Analysis , 1989 .

[34]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .