Deep Learning for Computer Vision: A Brief Review

Over the last years deep learning methods have been shown to outperform previous state-of-the-art machine learning techniques in several fields, with computer vision being one of the most prominent cases. This review paper provides a brief overview of some of the most significant deep learning schemes used in computer vision problems, that is, Convolutional Neural Networks, Deep Boltzmann Machines and Deep Belief Networks, and Stacked Denoising Autoencoders. A brief account of their history, structure, advantages, and limitations is given, followed by a description of their applications in various computer vision tasks, such as object detection, face recognition, action and activity recognition, and human pose estimation. Finally, a brief overview is given of future directions in designing deep learning schemes for computer vision problems and the challenges involved therein.

[1]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Joo-Hwee Lim,et al.  Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[4]  Bernt Schiele,et al.  How good are detection proposals, really? , 2014, BMVC.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[7]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[8]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[9]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[11]  Jian Sun,et al.  A Practical Transfer Learning Algorithm for Face Verification , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Xiaowu Chen,et al.  A Benchmark Dataset and Saliency-Guided Stacked Autoencoders for Video-Based Salient Object Detection , 2016, IEEE Transactions on Image Processing.

[13]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[14]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[15]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[16]  Ramakant Nevatia,et al.  Exploring deep learning based solutions in fine grained activity recognition in the wild , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[17]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[18]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[20]  Xinwei Zheng,et al.  Efficient Saliency-Based Object Detection in Remote Sensing Images Using Deep Belief Networks , 2016, IEEE Geoscience and Remote Sensing Letters.

[21]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Vinodkrishnan Kulathumani,et al.  Multiview fusion for activity recognition using deep neural networks , 2016, J. Electronic Imaging.

[23]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[25]  Theodora A. Varvarigou,et al.  A top-down event-driven approach for concurrent activity recognition , 2012, Multimedia Tools and Applications.

[26]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Tao Chen,et al.  S-CNN: Subcategory-Aware Convolutional Networks for Object Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yi Yang,et al.  DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Sanja Fidler,et al.  segDeepM: Exploiting segmentation and context in deep neural networks for object detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ronald M. Summers,et al.  Leveraging Mid-Level Semantic Boundary Cues for Automated Lymph Node Detection , 2015, MICCAI.

[32]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[33]  Meng Wang,et al.  A Deep Structured Model with Radius–Margin Bound for 3D Human Activity Recognition , 2015, International Journal of Computer Vision.

[34]  Nathalie Japkowicz,et al.  Nonlinear Autoassociation Is Not Equivalent to PCA , 2000, Neural Computation.

[35]  Luc Van Gool,et al.  Weakly Supervised Cascaded Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jian Dong,et al.  Towards Unified Object Detection and Semantic Segmentation , 2014, ECCV.

[37]  Ronald M. Summers,et al.  Colitis detection on abdominal CT scans by rich feature hierarchies , 2016, SPIE Medical Imaging.

[38]  Xiaogang Wang,et al.  Crowded Scene Understanding by Deeply Learned Volumetric Slices , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[39]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[40]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[41]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[42]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43]  Juha Karhunen,et al.  A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines , 2013, ICANN.

[44]  Honglak Lee,et al.  Learning hierarchical representations for face verification with convolutional deep belief networks , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[46]  M. F. Baumgardner,et al.  220 Band AVIRIS Hyperspectral Image Data Set: June 12, 1992 Indian Pine Test Site 3 , 2015 .

[47]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[48]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Honglak Lee,et al.  Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[50]  Mark Tygert,et al.  A Mathematical Motivation for Complex-Valued Convolutional Networks , 2015, Neural Computation.

[51]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Theodora A. Varvarigou,et al.  IMPROVING MULTI-CAMERA ACTIVITY RECOGNITION BY EMPLOYING NEURAL NETWORK BASED READJUSTMENT , 2012, Appl. Artif. Intell..

[53]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[54]  Peter N. Belhumeur,et al.  Tom-vs-Pete Classifiers and Identity-Preserving Alignment for Face Verification , 2012, BMVC.

[55]  Theodora A. Varvarigou,et al.  A Threefold Dataset for Activity and Workflow Recognition in Complex Industrial Environments , 2012, IEEE MultiMedia.

[56]  Xiaogang Wang,et al.  DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[58]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[59]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Nikolaos Doulamis,et al.  FAST-MDL: Fast Adaptive Supervised Training of multi-layered deep learning models for consistent object tracking and classification , 2016, 2016 IEEE International Conference on Imaging Systems and Techniques (IST).

[61]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[62]  Athanasios Voulodimos,et al.  A System for Multicamera Task Recognition and Summarization for Structured Environments , 2013, IEEE Transactions on Industrial Informatics.

[63]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[64]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[65]  Jian Sun,et al.  Bayesian Face Revisited: A Joint Formulation , 2012, ECCV.

[66]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[67]  Henggui Zhang,et al.  A deep learning network for right ventricle segmentation in short-axis MRI , 2016, 2016 Computing in Cardiology Conference (CinC).

[68]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[69]  Jonathan Tompson,et al.  Learning Human Pose Estimation Features with Convolutional Networks , 2013, ICLR.

[70]  Nikolaos D. Doulamis,et al.  Fast and Adaptive Deep Fusion Learning for Detecting Visual Objects , 2012, ECCV Workshops.

[71]  Xiaodong Gu,et al.  Max-Pooling Dropout for Regularization of Convolutional Neural Networks , 2015, ICONIP.

[72]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[73]  Klaus-Robert Müller,et al.  Deep Boltzmann Machines and the Centering Trick , 2012, Neural Networks: Tricks of the Trade.

[74]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Geoffrey E. Hinton,et al.  A Better Way to Pretrain Deep Boltzmann Machines , 2012, NIPS.

[76]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[77]  Jian Sun,et al.  Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Nikolaos D. Doulamis,et al.  Semi-supervised deep learning for object tracking and classification , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[79]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[80]  Alan L. Yuille,et al.  Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations , 2014, NIPS.

[81]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[82]  Julius Hannink,et al.  Activity recognition in beach volleyball using a Deep Convolutional Neural Network , 2017, Data Mining and Knowledge Discovery.

[83]  Tapani Raiko,et al.  Enhanced Gradient for Training Restricted Boltzmann Machines , 2013, Neural Computation.

[84]  Nikolaos Doulamis,et al.  Adaptable deep learning structures for object labeling/tracking under dynamic visual environments , 2018, Multimedia Tools and Applications.

[85]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[86]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[87]  Tal Hassner,et al.  Age and Gender Estimation of Unfiltered Faces , 2014, IEEE Transactions on Information Forensics and Security.

[88]  Fei-Fei Li,et al.  Combining the Right Features for Complex Event Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[89]  Theodora A. Varvarigou,et al.  A dataset for workflow recognition in industrial scenes , 2011, 2011 18th IEEE International Conference on Image Processing.

[90]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[91]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[92]  Simon J. Doran,et al.  Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[94]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[95]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[96]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[97]  Nikolaos Doulamis,et al.  Deep learning based human behavior recognition in industrial workflows , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[98]  Torbjørn Skauli,et al.  A collection of hyperspectral images for imaging systems research , 2013, Electronic Imaging.

[99]  Hulya Yalcin Human activity recognition using deep belief networks , 2016, 2016 24th Signal Processing and Communication Application Conference (SIU).

[100]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[102]  Yoshua Bengio,et al.  Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.

[103]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[104]  Mahadev Satyanarayanan,et al.  OpenFace: A general-purpose face recognition library with mobile applications , 2016 .

[105]  Xiaogang Wang,et al.  DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[106]  Sung-Bae Cho,et al.  Human activity recognition with smartphone sensors using deep learning neural networks , 2016, Expert Syst. Appl..

[107]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[108]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[109]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[110]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).