Learning local factor analysis versus mixture of factor analyzers with automatic model selection

Considering Factor Analysis (FA) for each component of Gaussian Mixture Model (GMM), clustering and local dimensionality reduction can be addressed simultaneously by Mixture of Factor Analyzers (MFA) and Local Factor Analysis (LFA), which correspond to two FA parameterizations, respectively. This paper investigates the performance of Variational Bayes (VB) and Bayesian Ying-Yang (BYY) harmony learning on MFA/LFA for the problem of automatically determining the component number and the local hidden dimensionalities (i.e., the number of factors of FA in each component). Similar to the existing VB learning algorithm on MFA, we develop an alternative VB algorithm on LFA with a similar conjugate Dirichlet-Normal-Gamma (DNG) prior on all parameters of LFA. Also, the corresponding BYY algorithms are developed for MFA and LFA. A wide range of synthetic experiments shows that LFA is superior to MFA in model selection under either VB or BYY, while BYY outperforms VB reliably on both MFA and LFA. These empirical findings are consistently observed from real applications on not only face and handwritten digit images clustering, but also unsupervised image segmentation.

[1]  Lei Xu,et al.  Parameterizations make different model selections: Empirical findings from factor analysis , 2011 .

[2]  Lei Xu,et al.  Bayesian Ying Yang System, Best Harmony Learning, and Gaussian Manifold Based Family , 2008, WCCI.

[3]  L. Xu Bayesian Ying-Yang system, best harmony learning, and five action circling , 2010 .

[4]  Lei Xu,et al.  Codimensional matrix pairing perspective of BYY harmony learning: hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology , 2011 .

[5]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[6]  H. Akaike A new look at the statistical model identification , 1974 .

[7]  Lei Xu,et al.  Advances on BYY harmony learning: information theoretic perspective, generalized projection geometry, and independent factor autodetermination , 2004, IEEE Transactions on Neural Networks.

[8]  Adam Krzyżak,et al.  Unsupervised and supervised classifications by rival penalized competitive learning , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[9]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[10]  Lei Xu,et al.  On essential topics of BYY harmony learning: Current status, challenging issues, and gene analysis applications , 2012 .

[11]  Albert Ali Salah,et al.  Incremental mixtures of factor analysers , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[12]  Andrew Zisserman,et al.  Texture classification: are filter banks necessary? , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[14]  Martial Hebert,et al.  Toward Objective Evaluation of Image Segmentation Algorithms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[17]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[18]  X. Lei Parameterizations make different model selections : Empirical findings from factor analysis , 2011 .

[19]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[20]  Lei Xu,et al.  Bayesian Ying Yang learning , 2007, Scholarpedia.

[21]  Lei Xu,et al.  Automatic model selection on local gaussian structures with priors: comparative investigations and applications , 2012 .

[22]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[24]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[25]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[26]  XU Lei Learning Gaussian mixture with automatic model selection : A comparative study on three Bayesian related approaches , .

[27]  Lei Xu,et al.  Learning Gaussian mixture with automatic model selection: A comparative study on three Bayesian related approaches , 2011 .

[28]  X. Lei,et al.  Codimensional matrix pairing perspective of BYY harmony learning: hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology , 2011 .

[29]  Nikolas P. Galatsanos,et al.  A Bayesian Framework for Image Segmentation With Spatially Varying Mixtures , 2010, IEEE Transactions on Image Processing.

[30]  Samuel Kotz,et al.  Multivariate T-Distributions and Their Applications , 2004 .

[31]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[32]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[33]  Jorma Rissanen Basics of estimation , 2010 .

[34]  Geoffrey E. Hinton,et al.  Recognizing Handwritten Digits Using Mixtures of Linear Models , 1994, NIPS.