Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics

We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only specified up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities.

[1]  L. Younes Parametric Inference for imperfectly observed Gibbsian fields , 1989 .

[2]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[3]  C. Geyer On the Convergence of Monte Carlo Maximum Likelihood Calculations , 1994 .

[4]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[5]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[6]  J. V. van Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[7]  J. H. Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998 .

[8]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[9]  S. Nash A survey of truncated-Newton methods , 2000 .

[10]  Aapo Hyvärinen,et al.  Topographic Independent Component Analysis , 2001, Neural Computation.

[11]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[12]  Nicol N. Schraudolph,et al.  Towards stochastic conjugate gradient methods , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[13]  Yee Whye Teh,et al.  Energy-Based Models for Sparse Overcomplete Representations , 2003, J. Mach. Learn. Res..

[14]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[15]  Larry Wasserman,et al.  All of Statistics , 2004 .

[16]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[17]  Yuhong Yang,et al.  Information Theory, Inference, and Learning Algorithms , 2005 .

[18]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[19]  Michael S. Lewicki,et al.  A Hierarchical Bayesian Model for Learning Nonlinear Statistical Regularities in Nonstationary Natural Signals , 2005, Neural Computation.

[20]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[21]  Geoffrey E. Hinton,et al.  Topographic Product Models Applied to Natural Scene Statistics , 2006, Neural Computation.

[22]  Geoffrey E. Hinton,et al.  Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[23]  Jörg Lücke,et al.  Maximal Causes for Non-linear Component Extraction , 2008, J. Mach. Learn. Res..

[24]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[25]  Aapo Hyvärinen,et al.  Optimal Approximation of Signal Priors , 2008, Neural Computation.

[26]  Aapo Hyvärinen,et al.  Learning Features by Contrasting Natural Images with Noise , 2009, ICANN.

[27]  Aapo Hyvärinen,et al.  A Two-Layer Model of Natural Stimuli Estimated with Score Matching , 2010, Neural Computation.

[28]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[29]  Ya-Xiang Yuan,et al.  Optimization Theory and Methods: Nonlinear Programming , 2010 .

[30]  Aapo Hyvärinen,et al.  A Family of Computationally E cient and Simple Estimators for Unnormalized Statistical Models , 2010, UAI.

[31]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[33]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[34]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[35]  Linda C. van der Gaag,et al.  Probabilistic Graphical Models , 2014, Lecture Notes in Computer Science.