Deep learning for visual understanding: A review

Deep learning algorithms are a subset of the machine learning algorithms, which aim at discovering multiple levels of distributed representations. Recently, numerous deep learning algorithms have been proposed to solve traditional artificial intelligence problems. This work aims to review the state-of-the-art in deep learning algorithms in computer vision by highlighting the contributions and challenges from over 210 recent research papers. It first gives an overview of various deep learning approaches and their recent developments, and then briefly describes their applications in diverse vision tasks, such as image classification, object detection, image retrieval, semantic segmentation and human pose estimation. Finally, the paper summarizes the future trends and challenges in designing and training deep neural networks.

[1]  Qi Tian,et al.  Search by Detection: Object-Level Feature for Image Retrieval , 2014, ICIMCS '14.

[2]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Cheng-Yuan Liou,et al.  Autoencoder for words , 2014, Neurocomputing.

[5]  Jonathan Tompson,et al.  Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Wensheng Zhang,et al.  A novel sparse auto-encoder for deep unsupervised learning , 2013, 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI).

[7]  Antonin Chambolle,et al.  Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage , 1998, IEEE Trans. Image Process..

[8]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[9]  Xiaogang Wang,et al.  DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Tapani Raiko,et al.  Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines , 2011, ICML.

[11]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[12]  Chris Eliasmith,et al.  Deep networks for robust visual recognition , 2010, ICML.

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[15]  Sanja Fidler,et al.  segDeepM: Exploiting segmentation and context in deep neural networks for object detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[17]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[18]  Derek Hoiem,et al.  Diagnosing Error in Object Detectors , 2012, ECCV.

[19]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Marc Teboulle,et al.  A fast Iterative Shrinkage-Thresholding Algorithm with application to wavelet-based image deblurring , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[23]  Hongxun Yao,et al.  Visualizing and Comparing Convolutional Neural Networks , 2014, ArXiv.

[24]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[25]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[26]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[27]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[28]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[29]  Thomas Brox,et al.  Unsupervised feature learning by augmenting single images , 2013, ICLR.

[30]  Xiaogang Wang,et al.  Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification , 2014, ArXiv.

[31]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[32]  Jian Sun,et al.  Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[34]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[36]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37]  Yoshua Bengio,et al.  An empirical analysis of dropout in piecewise linear networks , 2013, ICLR.

[38]  Eero P. Simoncelli 4.7 – Statistical Modeling of Photographic Images , 2005 .

[39]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[40]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[41]  Slobodan Ilic,et al.  Robust Human Body Shape and Pose Tracking , 2013, 2013 International Conference on 3D Vision.

[42]  Will Y. Zou Unsupervised learning of visual invariance with temporal coherence , 2011 .

[43]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[44]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[45]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[46]  Honglak Lee,et al.  Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[47]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Nicolas Heess,et al.  The Shape Boltzmann Machine: A strong model of object shape , 2012, CVPR.

[49]  David A. McAllester A PAC-Bayesian Tutorial with A Dropout Bound , 2013, ArXiv.

[50]  Trevor Darrell,et al.  LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[51]  Liang Lin,et al.  Deep Joint Task Learning for Generic Object Extraction , 2014, NIPS.

[52]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[53]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[55]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[56]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[57]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[58]  Yoshua Bengio,et al.  Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.

[59]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[61]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[62]  Yoshua Bengio,et al.  Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.

[63]  Ben Taskar,et al.  MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[65]  Robinson Piramuthu,et al.  ConceptLearner: Discovering visual concepts from weakly labeled image collections , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[67]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[68]  Yong Jae Lee,et al.  Weakly-supervised Discovery of Visual Pattern Configurations , 2014, NIPS.

[69]  Qi Tian,et al.  Seeing the Big Picture: Deep Embedding with Contextual Evidences , 2014, ArXiv.

[70]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[71]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[72]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[73]  Adam Krzyzak,et al.  Analysis of Feature Maps Selection in Supervised Learning Using Convolutional Neural Networks , 2014, Canadian Conference on AI.

[74]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .

[75]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[76]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[77]  Yuandong Tian,et al.  Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation , 2012, ECCV.

[78]  Xiaogang Wang,et al.  Multi-stage Contextual Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[79]  Jonathan Tompson,et al.  Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[80]  Luc Van Gool,et al.  Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[82]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[83]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[84]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[85]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[86]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[87]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[88]  Miguel Á. Carreira-Perpiñán,et al.  Distributed optimization of deeply nested systems , 2012, AISTATS.

[89]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[90]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[91]  Jason Weston,et al.  Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing , 2012, AISTATS.

[92]  Klaus-Robert Müller,et al.  Deep Boltzmann Machines and the Centering Trick , 2012, Neural Networks: Tricks of the Trade.

[93]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[94]  John D. Lafferty,et al.  Learning image representations from the pixel level via hierarchical sparse coding , 2011, CVPR 2011.

[95]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[96]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[97]  Bogdan Miclut Committees of Deep Feedforward Networks Trained with Few Data , 2014, GCPR.

[98]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[99]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[100]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[101]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[102]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[103]  Yuting Zhang,et al.  Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[104]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[105]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[106]  D. Whitteridge,et al.  Learning and Relearning , 1959, Science's STKE.

[107]  Yann LeCun,et al.  Saturating Auto-Encoders , 2013, ICLR 2013.

[108]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[109]  Xiaogang Wang,et al.  DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection , 2014, ArXiv.

[110]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[111]  Jun Yu,et al.  Human pose recovery by supervised spectral embedding , 2015, Neurocomputing.

[112]  Matthew D. Zeiler Hierarchical Convolutional Deep Learning in Computer Vision , 2013 .

[113]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[114]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[115]  Jürgen Schmidhuber,et al.  Transfer learning for Latin and Chinese characters with Deep Neural Networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[116]  Krishnakumar Balasubramanian,et al.  Smooth sparse coding via marginal regression for learning sparse representations , 2012, Artif. Intell..

[117]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[118]  Derek Hoiem,et al.  Learning to localize detected objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[119]  Yunchao Wei,et al.  Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[120]  Jürgen Schmidhuber,et al.  Multi-column deep neural network for traffic sign classification , 2012, Neural Networks.

[121]  Elnaz Jahani Heravi Large Scale Object Detection and Recognition , 2014 .

[122]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[123]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[124]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[125]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[126]  Juha Karhunen,et al.  A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines , 2013, ICANN.

[127]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[128]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[129]  Yanjun Qi,et al.  Unsupervised Feature Learning by Deep Sparse Coding , 2013, SDM.

[130]  Jitendra Malik,et al.  Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[131]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[132]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[133]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[134]  Hossein Mobahi,et al.  Deep Learning via Semi-supervised Embedding , 2012, Neural Networks: Tricks of the Trade.

[135]  Orhan Firat,et al.  Learning Deep Temporal Representations for Brain Decoding , 2014, ArXiv.

[136]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[137]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[138]  Luca Maria Gambardella,et al.  High-Performance Neural Networks for Visual Object Classification , 2011, ArXiv.

[139]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[140]  Luca Maria Gambardella,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[141]  Yoshua Bengio,et al.  Joint Training Deep Boltzmann Machines for Classification , 2013, ICLR.

[142]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[143]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[144]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[145]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[146]  Shengen Yan,et al.  Deep Image: Scaling up Image Recognition , 2015, ArXiv.

[147]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[148]  Xiaogang Wang,et al.  Multi-source Deep Learning for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[149]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[150]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[151]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[152]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[153]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[154]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[155]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[156]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[157]  Kenji Doya,et al.  Expected energy-based restricted Boltzmann machine for classification , 2015, Neural Networks.

[158]  Jian Dong,et al.  Towards Unified Object Detection and Semantic Segmentation , 2014, ECCV.

[159]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[160]  Li Deng,et al.  A tutorial survey of architectures, algorithms, and applications for deep learning , 2014, APSIPA Transactions on Signal and Information Processing.

[161]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[162]  Honglak Lee,et al.  Learning hierarchical representations for face verification with convolutional deep belief networks , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[163]  Geoffrey E. Hinton,et al.  A Better Way to Pretrain Deep Boltzmann Machines , 2012, NIPS.

[164]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[165]  Honglak Lee,et al.  Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[166]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[167]  Andrew Zisserman,et al.  2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[168]  Y. Censor,et al.  Parallel Optimization:theory , 1997 .

[169]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[170]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[171]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[172]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[173]  Yann LeCun,et al.  Learning Invariant Feature Hierarchies , 2012, ECCV Workshops.

[174]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification , 2014, ArXiv.

[175]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[176]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[177]  Devansh Arpit,et al.  Is Joint Training Better for Deep Auto-Encoders? , 2014 .

[178]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[179]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[180]  Jonathan Tompson,et al.  MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[181]  Andrew Zisserman,et al.  Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos , 2014, ACCV.

[182]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[183]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[184]  Christian Wolf,et al.  Spatio-Temporal Convolutional Sparse Auto-Encoder for Sequence Classification , 2012, BMVC.

[185]  Yi Li,et al.  Beyond Physical Connections: Tree Models in Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[186]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[187]  Trevor Darrell,et al.  From Large-Scale Object Classifiers to Large-Scale Object Detectors: An Adaptation Approach , 2014 .

[188]  Brendan J. Frey,et al.  Adaptive dropout for training deep neural networks , 2013, NIPS.

[189]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[190]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[191]  Bernt Schiele,et al.  How good are detection proposals, really? , 2014, BMVC.

[192]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[193]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[194]  In-So Kweon,et al.  Multi-scale pyramid pooling for deep convolutional representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[195]  Pierre Baldi,et al.  Understanding Dropout , 2013, NIPS.

[197]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[198]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[199]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[200]  Qiang Chen,et al.  Contextualizing Object Detection and Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[201]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[202]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[203]  Kang Zheng,et al.  Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[204]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[205]  Yu Liu,et al.  DeepIndex for Accurate and Efficient Image Retrieval , 2015, ICMR.

[206]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[207]  Gregory Shakhnarovich,et al.  Feedforward semantic segmentation with zoom-out features , 2014, CVPR.

[208]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[209]  Andrew G. Howard,et al.  Some Improvements on Deep Convolutional Neural Network Based Image Classification , 2013, ICLR.

[210]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[211]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[212]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[213]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[214]  Roland Memisevic,et al.  Zero-bias autoencoders and the benefits of co-adapting features , 2014, ICLR.

[215]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[216]  Zhang Xiong,et al.  3D object retrieval with stacked local convolutional autoencoder , 2015, Signal Process..

[217]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[218]  Li Xu,et al.  On Vectorization of Deep Convolutional Neural Networks for Vision Tasks , 2015, AAAI.

[219]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[220]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[221]  Andrew Zisserman,et al.  Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).