论文信息 - Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network that attends to scene elements and processes them one at a time. Crucially, the model itself learns to choose the appropriate number of inference steps. We use this scheme to learn to perform inference in partially specified 2D models (variable-sized variational auto-encoders) and fully specified 3D models (probabilistic renderers). We show that such models learn to identify multiple objects - counting, locating and classifying the elements of a scene - without any supervision, e.g., decomposing 3D images with various numbers of objects in a single forward pass of a neural network. We further show that the networks produce accurate inferences when compared to supervised counterparts, and that their structure leads to improved generalization.

[1] L. F. Pau,et al. Pattern Synthesis: Lectures in Pattern Theory, Vol. 1, U. Grenander. Springer-Verlag, New York/London (1976), 509, Applied Mathematical Sciences No. 18 , 1977 .

[2] Drew McDermott,et al. A critique of pure reason 1 , 1987, The Philosophy of Artificial Intelligence.

[3] Geoffrey E. Hinton,et al. The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[4] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5] Zhuowen Tu,et al. Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6] Stuart J. Russell,et al. BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[7] Stuart J. Russell,et al. Probabilistic models with unknown objects , 2006 .

[8] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[9] Samy Bengio,et al. Group Sparse Coding , 2009, NIPS.

[10] Andrew Zisserman,et al. Learning To Count Objects in Images , 2010, NIPS.

[11] Geoffrey E. Hinton,et al. Transforming Auto-Encoders , 2011, ICANN.

[12] Nicolas Le Roux,et al. Weakly Supervised Learning of Foreground-Background Segmentation Using Masked RBMs , 2011, ICANN.

[13] Nicolas Le Roux,et al. Learning a Generative Model of Images by Factoring Appearance and Shape , 2011, Neural Computation.

[14] Nicolas Heess,et al. The Shape Boltzmann Machine: A strong model of object shape , 2012, CVPR.

[15] Christopher K. I. Williams,et al. A Generative Model for Parts-based Object Segmentation , 2012, NIPS.

[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18] Geoffrey E. Hinton,et al. Tensor Analyzers , 2013, ICML.

[19] Joshua B. Tenenbaum,et al. Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs , 2013, NIPS.

[20] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[21] Nitish Srivastava,et al. Learning Generative Models with Visual Attention , 2013, NIPS.

[22] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[23] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[24] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[25] Michael J. Black,et al. OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[26] Margrit Betke,et al. Salient Object Subitizing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Pieter Abbeel,et al. Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[28] Joshua B. Tenenbaum,et al. Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[29] Jiajun Wu,et al. Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.