Advances on BYY harmony learning: information theoretic perspective, generalized projection geometry, and independent factor autodetermination

The nature of Bayesian Ying-Yang harmony learning is reexamined from an information theoretic perspective. Not only its ability for model selection and regularization is explained with new insights, but also discussions are made on its relations and differences from the studies of minimum description length (MDL), Bayesian approach, the bit-back based MDL, Akaike information criterion (AIC), maximum likelihood, information geometry, Helmholtz machines, and variational approximation. Moreover, a generalized projection geometry is introduced for further understanding such a new mechanism. Furthermore, new algorithms are also developed for implementing Gaussian factor analysis (FA) and non-Gaussian factor analysis (NFA) such that selecting appropriate factors is automatically made during parameter learning.

[1]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[2]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[3]  Geoffrey E. Hinton,et al.  Varieties of Helmholtz Machine , 1996, Neural Networks.

[4]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[5]  Lei Xu,et al.  Bayesian Kullback Ying-Yang dependence reduction theory , 1998, Neurocomputing.

[6]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[7]  Eric Saund,et al.  A Multiple Cause Mixture Model for Unsupervised Learning , 1995, Neural Computation.

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[9]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[10]  Herman Rubin,et al.  Statistical Inference in Factor Analysis , 1956 .

[11]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[12]  Lei Xu,et al.  RBF nets, mixture experts, and Bayesian Ying-Yang learning , 1998, Neurocomputing.

[13]  Lei Xu,et al.  Temporal BYY encoding, Markovian state spaces, and space dimension determination , 2004, IEEE Transactions on Neural Networks.

[14]  Lei Xu,et al.  Least mean square error reconstruction principle for self-organizing neural-nets , 1993, Neural Networks.

[15]  Colin Fyfe,et al.  Introducing Asymmetry into Interneuron Learning , 1995, Neural Computation.

[16]  Lei Xu,et al.  BYY learning, regularized implementation, and model selection on modular networks with one hidden layer of binary units , 2003, Neurocomputing.

[17]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[18]  Eric Moulines,et al.  Maximum likelihood for blind separation and deconvolution of noisy signals using mixture models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Andrzej Cichocki,et al.  New learning algorithm for blind separation of sources , 1992 .

[20]  Mitsuo Kawato,et al.  Cerebellum and motor control , 1998 .

[21]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[22]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[23]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[24]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[25]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[26]  Lei Xu,et al.  BYY harmony learning, independent state space, and generalized APT financial analyses , 2001, IEEE Trans. Neural Networks.

[27]  Joseph E. Cavanaugh,et al.  Regression and time series model selection using variants of the schwarz information criterion , 1997 .

[28]  Henry Tirri,et al.  Bayesian and Information-Theories Priors for Bayesian Network Parameters , 1998, ECML.

[29]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[30]  Richard A. Brown,et al.  Introduction to random signals and applied kalman filtering (3rd ed , 2012 .

[31]  XuLei BYY harmony learning, independent state space, and generalized APT financial analyses , 2001 .

[32]  Shun-ichi Amari,et al.  Learned parametric mixture based ICA algorithm , 1998, Neurocomputing.

[33]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[34]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[35]  L. Xu Independent Component Analysis and Extensions with Noise and Time: A Bayesian Ying-Yang Learning Perspective , 2003 .

[36]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[37]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[38]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[39]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[40]  Lei Xu Best harmony, unified RPCL and automated model selection for unsupervised and supervised learning on Gaussian mixtures, ME-RBF models and three-layer nets , 2001 .

[41]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[42]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[43]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[44]  Lei Xu,et al.  Data smoothing regularization, multi-sets-learning, and problem solving strategies , 2003, Neural Networks.

[45]  Michael S. Lewis-Beck,et al.  Factor analysis and related techniques , 1994 .

[46]  L. M. M.-T. Theory of Probability , 1929, Nature.

[47]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[48]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[49]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[50]  Lei Xu,et al.  BYY harmony learning, structural RPCL, and topological self-organizing on mixture models , 2002, Neural Networks.

[51]  L. Xu Bayesian Ying Yang Learning (I): A Unified Perspective for Statistical Modeling , 2004 .

[52]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[53]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[54]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[55]  Lei Xu,et al.  Temporal BYY learning for state space approach, hidden Markov model, and blind source separation , 2000, IEEE Trans. Signal Process..

[56]  H. Akaike A new look at the statistical model identification , 1974 .

[57]  L. Xu Independent Component Analysis and Extensions with Noise and Time: A Bayesian Ying-Yang Learning Perspective , 2003 .

[58]  Juha Karhunen,et al.  Representation and separation of signals using nonlinear PCA type learning , 1994, Neural Networks.

[59]  Lei Xu,et al.  Bayesian Ying-Yang machine, clustering and number of clusters , 1997, Pattern Recognit. Lett..

[60]  H. Jeffreys,et al.  Theory of probability , 1896 .