论文信息 - Asymptotic Bayesian Generalization Error in Latent Dirichlet Allocation and Stochastic Matrix Factorization

Asymptotic Bayesian Generalization Error in Latent Dirichlet Allocation and Stochastic Matrix Factorization

Latent Dirichlet allocation (LDA) is useful in document analysis, image processing, and many information systems; however, its generalization performance has been left unknown because it is a singular learning machine to which regular statistical theory can not be applied. Stochastic matrix factorization (SMF) is a restricted matrix factorization in which matrix factors are stochastic; the column of the matrix is in a simplex. SMF is being applied to image recognition and text mining. We can understand SMF as a statistical model by which a stochastic matrix of given data is represented by a product of two stochastic matrices, whose generalization performance has also been left unknown because of non-regularity. In this paper, by using an algebraic and geometric method, we show the analytic equivalence of LDA and SMF, both of which have the same real log canonical threshold (RLCT), resulting in that they asymptotically have the same Bayesian generalization error and the same log marginal likelihood. Moreover, we derive the upper bound of the RLCT and prove that it is smaller than the dimension of the parameter divided by two, hence the Bayesian generalization errors of them are smaller than those of regular statistical models.

Naoki Hayashi | Sumio Watanabe | Sumio Watanabe | Naoki Hayashi

[1] Mikio Sato,et al. On zeta functions associated with prehomogeneous vector spaces. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[2] Sumio Watanabe. Algebraic Analysis for Non-regular Learning Machines , 1999, NIPS.

[3] Mikkel N. Schmidt,et al. Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[4] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5] Sumio Watanabe,et al. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[6] Shotaro Akaho,et al. Progressive evolution of whole‐rock composition during metamorphism revealed by multivariate statistical analyses , 2018 .

[7] Simon J. Godsill,et al. Bayesian extensions to non-negative matrix factorisation for audio signal modelling , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] Sumio Watanabe. Algebraic geometrical methods for hierarchical learning machines , 2001, Neural Networks.

[9] Michael Atiyah,et al. Resolution of Singularities and Division of Distributions , 1970 .

[10] Sumio Watanabe,et al. Singularities in mixture models and upper bounds of stochastic complexity , 2003, Neural Networks.

[11] Hirotugu Akaike,et al. Likelihood and the Bayes procedure , 1980 .

[12] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13] Piotr Zwiernik. An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models , 2011, J. Mach. Learn. Res..

[14] Thomas Hofmann,et al. Topic-based language models using EM , 1999, EUROSPEECH.

[15] Line Harder Clemmensen,et al. Non-negative Matrix Factorization for binary data , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[16] I. N. Bernshtein. The analytic continuation of generalized functions with respect to a parameter , 1972 .

[17] G. Tellis,et al. Mining Marketing Meaning from Online Chatter: Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation , 2014 .

[18] Matthieu Vignes,et al. ChristopheGiraud. Introduction to High‐Dimensional Statistics. Boca Raton, CRC Press. , 2018, Biometrics.

[19] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[20] A MARKOV CHAIN MODEL FOR THE PROBABILITY OF PRECIPITATION OCCURRENCE IN INTERVALS OF VARIOUS LENGTH , 1963 .

[21] Christopher Adams,et al. Finite Mixture Models with One Exclusion Restriction , 2016 .

[22] Sumio Watanabe. Mathematical Theory of Bayesian Statistics , 2018 .

[23] H. Hironaka. Resolution of Singularities of an Algebraic Variety Over a Field of Characteristic Zero: II , 1964 .

[24] Sumio Watanabe,et al. Asymptotic behavior of exchange ratio in exchange Monte Carlo method , 2008, Neural Networks.

[25] Jesús Bobadilla,et al. Recommender Systems Clustering Using Bayesian Non Negative Matrix Factorization , 2018, IEEE Access.

[26] Christopher Adams,et al. Stochastic Matrix Factorization , 2016, ArXiv.

[27] Hiroshi Sawada,et al. Probabilistic Non-negative Inconsistent-resolution Matrices Factorization , 2015, CIKM.