Estimation of unnormalized statistical models without numerical integration

Parametric statistical models of continuous or discrete valued data are often not properly normalized, that is, they do not integrate or sum to unity. The normalization is essential for maximum likelihood estimation. While in principle, models can always be normalized by dividing them by their integral or sum (their partition function), this can in practice be extremely difficult. We have been developing methods for the estimation of unnormalized models which do not approximate the partition function using numerical integration. We review these methods, score matching and noise-contrastive estimation, point out extensions and connections both between them and methods by other authors, and discuss their pros and cons.

[1]  C. Geyer On the Convergence of Monte Carlo Maximum Likelihood Calculations , 1994 .

[2]  A. Gelman Method of Moments Using Monte Carlo Simulation , 1995 .

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[5]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[6]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[7]  Aapo Hyvärinen,et al.  Some extensions of score matching , 2007, Comput. Stat. Data Anal..

[8]  Garry Robins,et al.  An introduction to exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[9]  Aapo Hyvärinen,et al.  Optimal Approximation of Signal Priors , 2008, Neural Computation.

[10]  Siwei Lyu,et al.  Interpretation and Generalization of Score Matching , 2009, UAI.

[11]  Aapo Hyvärinen,et al.  Estimating Markov Random Field Potentials for Natural Images , 2009, ICA.

[12]  Aapo Hyvärinen,et al.  Learning Features by Contrasting Natural Images with Noise , 2009, ICANN.

[13]  Aapo Hyvärinen,et al.  A Two-Layer Model of Natural Stimuli Estimated with Score Matching , 2010, Neural Computation.

[14]  Charles F. Cadieu,et al.  Phase Coupling Estimation from Multivariate Phase Statistics , 2009, Neural Computation.

[15]  Aapo Hyvärinen,et al.  A Family of Computationally E cient and Simple Estimators for Unnormalized Statistical Models , 2010, UAI.

[16]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[17]  Junichiro Hirayama,et al.  Bregman divergence as general framework to estimate unnormalized statistical models , 2011, UAI.

[18]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[19]  Eero P. Simoncelli,et al.  Least Squares Estimation Without Priors or Supervision , 2011, Neural Computation.

[20]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[21]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[22]  Aapo Hyvärinen,et al.  Learning a selectivity-invariance-selectivity feature extraction architecture for images , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[23]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[24]  Min Xiao,et al.  Domain Adaptation for Sequence Labeling Tasks with a Probabilistic Language Adaptation Model , 2013, ICML.

[25]  Aapo Hyvärinen,et al.  A three-layer model of natural image statistics , 2013, Journal of Physiology-Paris.

[26]  A. Ijspeert The Handbook of Brain Theory and Neural Networks , 2015 .