Generative Image Modeling Using Spatial LSTMs

Modeling the distribution of natural images is challenging, partly because of strong statistical dependencies which can extend over hundreds of pixels. Recurrent neural networks have been successful in capturing long-range dependencies in a number of problems but only recently have found their way into generative image models. We here introduce a recurrent image model based on multidimensional long short-term memory units which are particularly suited for image modeling due to their spatial structure. Our model scales to images of arbitrary size and its likelihood is computationally tractable. We find that it outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting.

[1]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[2]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[3]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[6]  J. V. van Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[7]  I. Moorhead,et al.  Natural Images , 2000, Perception.

[8]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[9]  Y. Pawitan In all likelihood , 2001 .

[10]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[11]  Roger Sauter,et al.  In All Likelihood , 2002, Technometrics.

[12]  David Mumford,et al.  Occlusion Models for Natural Images: A Statistical Study of a Scale-Invariant Dead Leaves Model , 2004, International Journal of Computer Vision.

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  Geoffrey E. Hinton,et al.  Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[15]  Yiannis Aloimonos,et al.  Who killed the directed model? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jürgen Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[17]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[18]  Michael J. Black,et al.  Fields of Experts , 2009, International Journal of Computer Vision.

[19]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[20]  Geoffrey E. Hinton,et al.  Learning Generative Texture Models with extended Fields-of-Experts , 2009, BMVC.

[21]  Matthias Bethge,et al.  Lower bounds on the redundancy of natural images , 2010, Vision Research.

[22]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[23]  Yair Weiss,et al.  From learning models of natural image patches to whole image restoration , 2011, 2011 International Conference on Computer Vision.

[24]  Matthias Bethge,et al.  In All Likelihood, Deep Belief Is Not Enough , 2010, J. Mach. Learn. Res..

[25]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[26]  Geoffrey E. Hinton,et al.  On deep generative models with applications to recognition , 2011, CVPR 2011.

[27]  Yair Weiss,et al.  "Natural Images, Gaussian Mixtures and Dead Leaves" , 2012, NIPS.

[28]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[29]  M. Bethge,et al.  Mixtures of Conditional Gaussian Scale Mixtures Applied to Multiscale Image Representations , 2011, PloS one.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Matthias Bethge,et al.  Training sparse natural image models with a fast Gibbs sampler of an extended state space , 2012, NIPS.

[32]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[33]  Hugo Larochelle,et al.  RNADE: The real-valued neural autoregressive density-estimator , 2013, NIPS.

[34]  Benjamin Schrauwen,et al.  Factoring Variations in Natural Images with Deep Gaussian Mixture Models , 2014, NIPS.

[35]  Marc'Aurelio Ranzato,et al.  Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.

[36]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[37]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[38]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[39]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[40]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[41]  Hugo Larochelle,et al.  A Deep and Tractable Density Estimator , 2013, ICML.

[42]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[43]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[44]  Benjamin Schrauwen,et al.  The student-t mixture as a natural image patch prior with application to image compression , 2014, J. Mach. Learn. Res..

[45]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[46]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[47]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[48]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[49]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[50]  Matthias Bethge,et al.  Modeling Natural Image Statistics , 2015 .

[51]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[52]  Jana Reinhard,et al.  Textures A Photographic Album For Artists And Designers , 2016 .