Deep Nonparametric Estimation of Discrete Conditional Distributions via Smoothed Dyadic Partitioning

We present an approach to deep estimation of discrete conditional probability distributions. Such models have several applications, including generative modeling of audio, image, and video data. Our approach combines two main techniques: dyadic partitioning and graph-based smoothing of the discrete space. By recursively decomposing each dimension into a series of binary splits and smoothing over the resulting distribution using graph-based trend filtering, we impose a strict structure to the model and achieve much higher sample efficiency. We demonstrate the advantages of our model through a series of benchmarks on both synthetic and real-world datasets, in some cases reducing the error by nearly half in comparison to other popular methods in the literature. All of our models are implemented in Tensorflow and publicly available at this https URL .

[1]  Donovan Lieu,et al.  Spatial Adaptation in Trend Filtering , 2017 .

[2]  Charles Elkan,et al.  Predicting Surgery Duration with Neural Heteroscedastic Regression , 2017, MLHC.

[3]  Mohammad Norouzi,et al.  Pixel Recursive Super Resolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[5]  Aditya Deshpande,et al.  Learning Diverse Image Colorization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  David Vázquez,et al.  PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[7]  Barnabás Póczos,et al.  Enabling Dark Energy Science with Deep Generative Models of Galaxy Images , 2016, AAAI.

[8]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[9]  Adler J. Perotte,et al.  Deep Survival Analysis , 2016, MLHC.

[10]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[11]  Yu-Xiang Wang,et al.  Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers , 2016, NIPS.

[12]  Hugo Larochelle,et al.  Neural Autoregressive Distribution Estimation , 2016, J. Mach. Learn. Res..

[13]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[14]  Alexander J. Smola,et al.  Trend Filtering on Graphs , 2014, J. Mach. Learn. Res..

[15]  James G. Scott,et al.  Multiscale Spatial Density Smoothing: An Application to Large-Scale Radiological Survey and Anomaly Detection , 2015, 1507.07271.

[16]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[17]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[18]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  R. Tibshirani Adaptive piecewise polynomial estimation via trend filtering , 2013, 1304.2986.

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  Hugo Larochelle,et al.  RNADE: The real-valued neural autoregressive density-estimator , 2013, NIPS.

[24]  Yoshua Bengio,et al.  A Hybrid Pareto Mixture for Conditional Asymmetric Fat-Tailed Distributions , 2009, IEEE Transactions on Neural Networks.

[25]  Yang Yang,et al.  Bagging binary and quantile predictors for time series , 2006 .

[26]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[27]  Andrew W. Moore,et al.  Nonparametric Density Estimation: Toward Computational Tractability , 2003, SDM.

[28]  James W. Taylor A Quantile Regression Neural Network Approach to Estimating the Conditional Density of Multiperiod Returns , 2000 .

[29]  S. Srihari Mixture Density Networks , 1994 .