Neural Likelihoods via Cumulative Distribution Functions

We leverage neural networks as universal approximators of monotonic functions to build a parameterization of conditional cumulative distribution functions (CDFs). By the application of automatic differentiation with respect to response variables and then to parameters of this CDF representation, we are able to build black box CDF and density estimators. A suite of families is introduced as alternative constructions for the multivariate case. At one extreme, the simplest construction is a competitive density estimator against state-of-the-art deep learning methods, although it does not provide an easily computable representation of multivariate CDFs. At the other extreme, we have a flexible construction from which multivariate CDF evaluations and marginalizations can be obtained by a simple forward pass in a deep neural net, but where the computation of the likelihood scales exponentially with dimensionality. Alternatives in between the extremes are discussed. We evaluate the different representations empirically on a variety of tasks involving tail area probabilities, tail dependence and (partial) density estimation.

[1]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[2]  Shan Sung Liew,et al.  Bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems , 2016, Neurocomputing.

[3]  Joseph Sill,et al.  Monotonic Networks , 1997, NIPS.

[4]  Francisco Louzada,et al.  Interdisciplinary Bayesian statistics : EBEB 2014 , 2015 .

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Shengdong Zhang From CDF to PDF - A Density Estimation Method for High Dimensional Data , 2018, ArXiv.

[7]  Marina Velikova,et al.  Monotone and Partially Monotone Neural Networks , 2010, IEEE Transactions on Neural Networks.

[8]  A. Likas Probability density estimation using artificial neural networks , 2001 .

[9]  Ryan P. Adams,et al.  The Gaussian Process Density Sampler , 2008, NIPS.

[10]  Amir F. Atiya,et al.  Density estimation and random variate generation using multilayer networks , 2002, IEEE Trans. Neural Networks.

[11]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[12]  Barnabás Póczos,et al.  Transformation Autoregressive Networks , 2018, ICML.

[13]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[14]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[15]  Shouhong Wang,et al.  A neural network method of density estimation for univariate unimodal data , 1994, Neural Computing & Applications.

[16]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[17]  Thorsten Schmidt,et al.  Coping with Copulas , 2006 .

[18]  D. Brigo,et al.  Parameterizing correlations: a geometric interpretation , 2007 .

[19]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[20]  Hugo Larochelle,et al.  A Deep and Tractable Density Estimator , 2013, ICML.

[21]  Hugo Larochelle,et al.  RNADE: The real-valued neural autoregressive density-estimator , 2013, NIPS.

[22]  S. Srihari Mixture Density Networks , 1994 .

[23]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[24]  Quoc V. Le,et al.  Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.

[25]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[26]  Glyn A. Holton Value at Risk: Theory and Practice , 2003 .

[27]  Edmondo Trentin,et al.  Soft-Constrained Nonparametric Density Estimation with Artificial Neural Networks , 2016, ANNPR.

[28]  Joseph Sill,et al.  Monotonicity Hints , 1996, NIPS.

[29]  Yee Whye Teh,et al.  Mixed Cumulative Distribution Networks , 2010, AISTATS.

[30]  Bernhard Lang,et al.  Monotonic Multi-layer Perceptron Networks as Universal Approximators , 2005, ICANN.

[31]  Gal Elidan,et al.  Copulas in Machine Learning , 2013 .

[32]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[33]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[34]  Marina Velikova,et al.  Comparison of universal approximators incorporating partial monotonicity by structure , 2010, Neural Networks.

[35]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[36]  Brendan J. Frey,et al.  Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs , 2008, J. Mach. Learn. Res..

[37]  Nebojsa Jojic,et al.  Exact inference and learning for cumulative distribution functions on loopy graphs , 2010, NIPS.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Nicola De Cao,et al.  Block Neural Autoregressive Flow , 2019, UAI.

[40]  Alexandre Lacoste,et al.  Neural Autoregressive Flows , 2018, ICML.

[41]  Ricardo Silva Bayesian Inference in Cumulative Distribution Fields , 2015 .

[42]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[43]  Pablo Zegers,et al.  Consistent Density Function Estimation with Multilayer Perceptrons , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.