Deterministic Latent Variable Models and Their Pitfalls

We derive a number of well known deterministic latent variable models such as PCA, ICA, EPCA, NMF and PLSA as variational EM approximations with point posteriors. We show that the often practiced heuristic of “folding-in” can lead to overly optimistic estimates of the test-set log-likelihood and we verify this result experimentally. We trace this problem back to an infinitely negative entropy term that is ignored in the variational approximation.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[3]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[4]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[5]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[6]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[7]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[8]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[9]  Ata Kabán,et al.  On an equivalence between PLSI and LDA , 2003, SIGIR.

[10]  Yee Whye Teh,et al.  Energy-Based Models for Sparse Overcomplete Representations , 2003, J. Mach. Learn. Res..

[11]  Chris Beytes Head and tails , 2003 .

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Tatsuya Kawahara,et al.  Language Model Adaptation Based on PLSA of Topics and Speakers for Automatic Transcription of Panel Discussions , 2004, IEICE Trans. Inf. Syst..

[14]  Richard S. Zemel,et al.  The multiple multiplicative factor model for collaborative filtering , 2004, ICML.

[15]  Thorsten Brants,et al.  Test Data Likelihood for PLSA Models , 2005, Information Retrieval.

[16]  Doug Downey,et al.  Heads and tails: studies of web search with common and rare queries , 2007, SIGIR.

[17]  Thomas L. Griffiths,et al.  A probabilistic approach to semantic representation , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.