论文信息 - Building high-level features using large scale unsupervised learning

Building high-level features using large scale unsupervised learning

We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? To answer this, we train a deep sparse autoencoder on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200×200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting from these learned features, we trained our network to recognize 22,000 object categories from ImageNet and achieve a leap of 70% relative improvement over the previous state-of-the-art.

[1] D. Hubel,et al. Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[2] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[3] Kunihiko Fukushima,et al. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[4] R. Desimone,et al. Stimulus-selective properties of inferior temporal neurons in the macaque , 1984, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[5] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[6] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[8] Yoshua Bengio,et al. Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[9] Peter Dayan,et al. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[10] H. Bekkering,et al. Developmental psychology: Rational imitation in preverbal infants , 2002, Nature.

[11] B. Pakkenberg,et al. Aging and the human neocortex , 2003, Experimental Gerontology.

[12] P. Lennie. Receptive fields , 2003, Current Biology.

[13] Laurenz Wiskott,et al. Slow feature analysis yields a rich repertoire of complex cell properties. , 2005, Journal of vision.

[14] C. Koch,et al. Invariant visual representation by single neurons in the human brain , 2005, Nature.

[15] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[16] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[17] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18] Rajat Raina,et al. Efficient sparse coding algorithms , 2006, NIPS.

[19] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.

[20] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .

[21] Rajat Raina,et al. Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[22] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Eero P. Simoncelli,et al. Nonlinear image representation using divisive normalization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Nicolas Pinto,et al. Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[25] Weiwei Zhang,et al. Cat Head Detection - How to Effectively Exploit Shape and Texture Features , 2008, ECCV.

[26] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[27] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[29] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30] Quoc V. Le,et al. Measuring Invariances in Deep Networks , 2009, NIPS.

[31] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[32] Pascal Vincent,et al. Visualizing Higher-Layer Features of a Deep Network , 2009 .

[33] Le Li,et al. SENSC: a Stable and Efficient Algorithm for Nonnegative Sparse Coding: SENSC: a Stable and Efficient Algorithm for Nonnegative Sparse Coding , 2009 .

[34] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[35] Quoc V. Le,et al. Tiled convolutional neural networks , 2010, NIPS.

[36] Yann LeCun,et al. Emergence of Complex-Like Cells in a Temporal Product Network with Local Receptive Fields , 2010, ArXiv.

[37] Shimon Ullman,et al. The chains model for detecting parts by their context , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38] Klaus-Robert Müller,et al. Layer-wise analysis of deep networks with Gaussian kernels , 2010, NIPS.

[39] Fei-Fei Li,et al. What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[40] Luca Maria Gambardella,et al. Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[41] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[42] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.

[43] Jason Weston,et al. WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[44] Quoc V. Le,et al. ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[45] Yann LeCun,et al. Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[46] Dariu Gavrila,et al. A new benchmark for stereo-based pedestrian detection , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[47] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[48] Florent Perronnin,et al. High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[49] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[50] James J. DiCarlo,et al. How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.