A comparative investigation on subspace dimension determination

It is well-known that constrained Hebbian self-organization on multiple linear neural units leads to the same k-dimensional subspace spanned by the first k principal components. Not only the batch PCA algorithm has been widely applied in various fields since 1930s, but also a variety of adaptive algorithms have been proposed in the past two decades. However, most studies assume a known dimension k or determine it heuristically, though there exist a number of model selection criteria in the literature of statistics. Recently, criteria have also been obtained under the framework of Bayesian Ying-Yang (BYY) harmony learning. This paper further investigates the BYY criteria in comparison with existing typical criteria, including Akaike's information criterion (AIC), the consistent Akaike's information criterion (CAIC), the Bayesian inference criterion (BIC), and the cross-validation (CV) criterion. This comparative study is made via experiments not only on simulated data sets of different sample sizes, noise variances, data space dimensions, and subspace dimensions, but also on two real data sets from air pollution problem and sport track records, respectively. Experiments have shown that BIC outperforms AIC, CAIC, and CV while the BYY criteria are either comparable with or better than BIC. Therefore, BYY harmony learning is a more preferred tool for subspace dimension determination by further considering that the appropriate subspace dimension k can be automatically determined during implementing BYY harmony learning for the principal subspace while the selection of subspace dimension k by BIC, AIC, CAIC, and CV has to be made at the second stage based on a set of candidate subspaces with different dimensions which have to be obtained at the first stage of learning.

[1]  H. Kaiser A second generation little jiffy , 1970 .

[2]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[3]  Eric R. Ziecel Proceedings of the First U.S./Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach , 1994 .

[4]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[5]  Lei Xu,et al.  BYY harmony learning, structural RPCL, and topological self-organizing on mixture models , 2002, Neural Networks.

[6]  Alan L. Yuille,et al.  Robust principal component analysis by self-organizing rules based on statistical physics approach , 1995, IEEE Trans. Neural Networks.

[7]  Lei Xu,et al.  BYY harmony learning, independent state space, and generalized APT financial analyses , 2001, IEEE Trans. Neural Networks.

[8]  Juha Karhunen,et al.  Principal component neural networks — Theory and applications , 1998, Pattern Analysis and Applications.

[9]  Michael I. Jordan,et al.  Mixtures of Probabilistic Principal Component Analyzers , 2001 .

[10]  H. Akaike Factor analysis and AIC , 1987 .

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  Stanley L. Sclove,et al.  Some Aspects of Model-Selection Criteria , 1994 .

[13]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[14]  D. Lawley VI.—The Estimation of Factor Loadings by the Method of Maximum Likelihood , 1940 .

[15]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[16]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[17]  Lei Xu,et al.  Temporal BYY learning for state space approach, hidden Markov model, and blind source separation , 2000, IEEE Trans. Signal Process..

[18]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[19]  Herman Rubin,et al.  Statistical Inference in Factor Analysis , 1956 .

[20]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[21]  Kurt Hornik,et al.  Learning in linear neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[22]  H. Akaike A new look at the statistical model identification , 1974 .

[23]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[24]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[25]  Lei Xu,et al.  Theories for unsupervised learning: PCA and its nonlinear extensions , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[26]  Lei Xu,et al.  Beyond PCA Learnings : From Linear to Nonlinear and From Global Representation to Local Representation , 1994 .

[27]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[28]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[29]  Lei Xu,et al.  Best Harmony, Unified RPCL and Automated Model Selection for Unsupervised and Supervised Learning on Gaussian Mixtures, Three-Layer Nets and ME-RBF-SVM Models , 2001, Int. J. Neural Syst..

[30]  L. Xu Independent Component Analysis and Extensions with Noise and Time: A Bayesian Ying-Yang Learning Perspective , 2003 .

[31]  J. Rubner,et al.  A Self-Organizing Network for Principal-Component Analysis , 1989 .

[32]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[33]  Lei Xu,et al.  Data smoothing regularization, multi-sets-learning, and problem solving strategies , 2003, Neural Networks.