A Family of Computationally E cient and Simple Estimators for Unnormalized Statistical Models

We introduce a new family of estimators for unnormalized statistical models. Our family of estimators is parameterized by two nonlinear functions and uses a single sample from an auxiliary distribution, generalizing Maximum Likelihood Monte Carlo estimation of Geyer and Thompson (1992). The family is such that we can estimate the partition function like any other parameter in the model. The estimation is done by optimizing an algebraically simple, well defined objective function, which allows for the use of dedicated optimization methods. We establish consistency of the estimator family and give an expression for the asymptotic covariance matrix, which enables us to further analyze the influence of the nonlinearities and the auxiliary density on estimation performance. Some estimators in our family are particularly stable for a wide range of auxiliary densities. Interestingly, a specific choice of the nonlinearity establishes a connection between density estimation and classification by nonlinear logistic regression. Finally, the optimal amount of auxiliary samples relative to the given amount of the data is considered from the perspective of computational efficiency.

[1]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[2]  Aapo Hyvärinen,et al.  Estimating Markov Random Field Potentials for Natural Images , 2009, ICA.

[3]  Michael J. Black,et al.  Fields of Experts , 2009, International Journal of Computer Vision.

[4]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[5]  A. Hyvärinen,et al.  Estimation of Non-normalized Statistical Models , 2009 .

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Stuart Barber,et al.  All of Statistics: a Concise Course in Statistical Inference , 2005 .

[8]  Aapo Hyvärinen,et al.  A Two-Layer Model of Natural Stimuli Estimated with Score Matching , 2010, Neural Computation.

[9]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[12]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[13]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[14]  Geoffrey E. Hinton,et al.  Topographic Product Models Applied to Natural Scene Statistics , 2006, Neural Computation.

[15]  C. Geyer,et al.  Constrained Monte Carlo Maximum Likelihood for Dependent Data , 1992 .