New advances on Bayesian Ying-Yang learning system with Kullback and non-Kullback separation functionals

In this paper, we extend Bayesian-Kullback Ying-Yang (BKYY) learning into a much broader Bayesian Ying-Yang (BYY) learning system via different separation functionals instead of using only Kullback divergence, and elaborate the power of BYY learning as a general learning theory for parameter learning, scale selection, structure evaluation, regularization and sampling design. Improved criteria are proposed for selecting number of densities on finite mixture and Gaussian mixtures, for selecting number of clusters in MSE clustering, for selecting subspace dimension in PCA related methods, for selecting number of expert nets in mixture of experts and its alternative model and for selecting number of basis functions in RBF nets. Three categories of non-Kullback separation functionals namely convex divergence, L/sub p/ divergence and decorrelation index, are suggested for BYY learning as alternatives for those learning models based on Kullback divergence, with some properties discussed. As examples, the EM algorithms for finite mixture, mixture of experts and its alternative model are derived with convex divergence.

[1]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[2]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[3]  Lei Xu,et al.  A Unified Learning Scheme: Bayesian-Kullback Ying-Yang Machines , 1995, NIPS.

[4]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[5]  R. Bucy Linear and nonlinear filtering , 1970 .

[6]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[7]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[8]  Lei Xu A Uni ed Learning Framework Multisets Modeling Learning , 1995 .

[9]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[10]  L. Devroye A Course in Density Estimation , 1987 .

[11]  Shun-ichi Amari,et al.  Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.

[12]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[13]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[14]  Lei Xu,et al.  How many clusters?: A Ying-Yang machine based theory for a classical open problem in pattern recognition , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[15]  Lei Xu,et al.  Least mean square error reconstruction principle for self-organizing neural-nets , 1993, Neural Networks.