Asymptotic Bayesian Generalization Error in Latent Dirichlet Allocation and Stochastic Matrix Factorization

Latent Dirichlet allocation (LDA) is useful in document analysis, image processing, and many information systems; however, its generalization performance has been left unknown because it is a singular learning machine to which regular statistical theory can not be applied. Stochastic matrix factorization (SMF) is a restricted matrix factorization in which matrix factors are stochastic; the column of the matrix is in a simplex. SMF is being applied to image recognition and text mining. We can understand SMF as a statistical model by which a stochastic matrix of given data is represented by a product of two stochastic matrices, whose generalization performance has also been left unknown because of non-regularity. In this paper, by using an algebraic and geometric method, we show the analytic equivalence of LDA and SMF, both of which have the same real log canonical threshold (RLCT), resulting in that they asymptotically have the same Bayesian generalization error and the same log marginal likelihood. Moreover, we derive the upper bound of the RLCT and prove that it is smaller than the dimension of the parameter divided by two, hence the Bayesian generalization errors of them are smaller than those of regular statistical models.

[1]  Mikio Sato,et al.  On zeta functions associated with prehomogeneous vector spaces. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Sumio Watanabe Algebraic Analysis for Non-regular Learning Machines , 1999, NIPS.

[3]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[4]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[6]  Shotaro Akaho,et al.  Progressive evolution of whole‐rock composition during metamorphism revealed by multivariate statistical analyses , 2018 .

[7]  Simon J. Godsill,et al.  Bayesian extensions to non-negative matrix factorisation for audio signal modelling , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Sumio Watanabe Algebraic geometrical methods for hierarchical learning machines , 2001, Neural Networks.

[9]  Michael Atiyah,et al.  Resolution of Singularities and Division of Distributions , 1970 .

[10]  Sumio Watanabe,et al.  Singularities in mixture models and upper bounds of stochastic complexity , 2003, Neural Networks.

[11]  Hirotugu Akaike,et al.  Likelihood and the Bayes procedure , 1980 .

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Piotr Zwiernik An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models , 2011, J. Mach. Learn. Res..

[14]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.

[15]  Line Harder Clemmensen,et al.  Non-negative Matrix Factorization for binary data , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[16]  I. N. Bernshtein The analytic continuation of generalized functions with respect to a parameter , 1972 .

[17]  G. Tellis,et al.  Mining Marketing Meaning from Online Chatter: Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation , 2014 .

[18]  Matthieu Vignes,et al.  ChristopheGiraud. Introduction to High‐Dimensional Statistics. Boca Raton, CRC Press. , 2018, Biometrics.

[19]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[20]  A MARKOV CHAIN MODEL FOR THE PROBABILITY OF PRECIPITATION OCCURRENCE IN INTERVALS OF VARIOUS LENGTH , 1963 .

[21]  Christopher Adams,et al.  Finite Mixture Models with One Exclusion Restriction , 2016 .

[22]  Sumio Watanabe Mathematical Theory of Bayesian Statistics , 2018 .

[23]  H. Hironaka Resolution of Singularities of an Algebraic Variety Over a Field of Characteristic Zero: II , 1964 .

[24]  Sumio Watanabe,et al.  Asymptotic behavior of exchange ratio in exchange Monte Carlo method , 2008, Neural Networks.

[25]  Jesús Bobadilla,et al.  Recommender Systems Clustering Using Bayesian Non Negative Matrix Factorization , 2018, IEEE Access.

[26]  Christopher Adams,et al.  Stochastic Matrix Factorization , 2016, ArXiv.

[27]  Hiroshi Sawada,et al.  Probabilistic Non-negative Inconsistent-resolution Matrices Factorization , 2015, CIKM.

[28]  D. Sonnadara,et al.  A Markov chain probability model to describe wet and dry patterns of weather at Colombo , 2014, Theoretical and Applied Climatology.

[29]  Naoki Hayashi,et al.  Upper bound of Bayesian generalization error in non-negative matrix factorization , 2016, Neurocomputing.

[30]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[31]  George P. H. Styan,et al.  Markov Chains Applied to Marketing * , 1964 .

[32]  Sumio Watanabe,et al.  Stochastic complexities of reduced rank regression in Bayesian estimation , 2005, Neural Networks.

[33]  Naoki Hayashi,et al.  Tighter upper bound of real log canonical threshold of non-negative matrix factorization and its application to Bayesian inference , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[34]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Miki Aoyagi Stochastic Complexity and Generalization Error of a Restricted Boltzmann Machine in Bayesian Estimation , 2010, J. Mach. Learn. Res..

[36]  M. Plummer,et al.  A Bayesian information criterion for singular models , 2013, 1309.0911.

[37]  Marcos E. Orchard,et al.  Consumption modeling based on Markov chains and Bayesian networks for a demand side management design of isolated microgrids , 2017 .

[38]  Dan Geiger,et al.  Asymptotic Model Selection for Naive Bayesian Networks , 2002, J. Mach. Learn. Res..

[39]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.