Going deeper with convolutions

We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

[1]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[2]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[3]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[4]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[5]  Narendra Ahuja,et al.  Motion and Structure from Line Correspondences; Closed-Form Solution, Uniqueness, and Optimization , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[7]  S. Sutherland Eye, brain and vision , 1993, Nature.

[8]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Richard I. Hartley,et al.  Projective reconstruction from line correspondences , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[11]  M. Bradley,et al.  Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. , 1994, Journal of behavior therapy and experimental psychiatry.

[12]  Amnon Shashua,et al.  Algebraic Functions For Recognition , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[14]  Michael J. Black,et al.  Estimating Optical Flow in Segmented Images Using Variable-Order Parametric Models With Local Deformations , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[19]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[20]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  David A. Forsyth,et al.  Mixtures of trees for object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[23]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[24]  Andrew W. Fitzgibbon,et al.  Simultaneous linear estimation of multiple view geometry and lens distortion , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[25]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[26]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[27]  Jiri Matas,et al.  Locally Optimized RANSAC , 2003, DAGM-Symposium.

[28]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[29]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  Charles A. Micchelli,et al.  Kernels for Multi--task Learning , 2004, NIPS.

[32]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[33]  M. Werman,et al.  Color lines: image specific color representation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[34]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[35]  Minas E. Spetsakis,et al.  Structure from motion using line correspondences , 1990, International Journal of Computer Vision.

[36]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[37]  Richard G. Baraniuk,et al.  ForWaRD: Fourier-wavelet regularized deconvolution for ill-conditioned systems , 2004, IEEE Transactions on Signal Processing.

[38]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[39]  Richard I. Hartley,et al.  Lines and Points in Three Views and the Trifocal Tensor , 1997, International Journal of Computer Vision.

[40]  Daniel Cremers,et al.  Motion Competition: A Variational Approach to Piecewise Parametric Motion Segmentation , 2005, International Journal of Computer Vision.

[41]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[42]  Takeo Kanade,et al.  Robust L/sub 1/ norm factorization in the presence of outliers and missing data by alternative convex programming , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[44]  Andrew Blake,et al.  Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[45]  Navneet Dalal,et al.  Finding People in Images and Videos , 2006 .

[46]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[47]  Mohammed Bennamoun,et al.  Three-Dimensional Model-Based Object Recognition and Segmentation in Cluttered Scenes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[49]  Anat Levin,et al.  Blind Motion Deblurring Using Image Statistics , 2006, NIPS.

[50]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Luc Van Gool,et al.  Efficient Mining of Frequent and Distinctive Feature Configurations , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[52]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Zhi-Hua Zhou,et al.  Analyzing Co-training Style Algorithms , 2007, ECML.

[54]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[57]  James W. Davis,et al.  Background-subtraction using contour-based fusion of thermal and visible imagery , 2007, Comput. Vis. Image Underst..

[58]  Polina Golland,et al.  Convex Clustering with Exemplar-Based Models , 2007, NIPS.

[59]  H. Hirschmüller Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  T. Blumensath,et al.  Iterative Thresholding for Sparse Approximations , 2008 .

[61]  Zuzana Kukelova,et al.  Fast and robust numerical solutions to minimal problems for cameras with radial distortion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Andrew W. Fitzgibbon,et al.  Global stereo reconstruction under second order smoothness priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[65]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[66]  Karen O. Egiazarian,et al.  Image restoration by sparse 3D transform-domain collaborative filtering , 2008, Electronic Imaging.

[67]  Michael I. Jordan,et al.  Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes , 2008, NIPS.

[68]  Richard Szeliski,et al.  Skeletal graphs for efficient structure from motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[70]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[71]  R. Szeliski,et al.  Image deblurring and denoising using color priors , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[73]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  B. Schiele,et al.  Pedestrian detection: A benchmark , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  Zhi-Hua Zhou,et al.  Multi-instance learning by treating instances as non-I.I.D. samples , 2008, ICML '09.

[76]  Wei Tang,et al.  Clustering with Multiple Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[77]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[78]  Daphne Koller,et al.  MAP Estimation of Semi-Metric MRFs via Hierarchical Graph Cuts , 2009, UAI.

[79]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[80]  H. Ishikawa Higher-order clique reduction in binary graph cut , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[82]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[83]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  Seungyong Lee,et al.  Fast motion deblurring , 2009, ACM Trans. Graph..

[85]  Matthew A. Brown,et al.  Minimal Solutions for Panoramic Stitching with Radial Distortion , 2009, BMVC.

[86]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[88]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[89]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[90]  Xindong Wu,et al.  SMILE: A Similarity-Based Approach for Multiple Instance Learning , 2010, 2010 IEEE International Conference on Data Mining.

[91]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[92]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[93]  Jean Ponce,et al.  Non-uniform Deblurring for Shaken Images , 2012, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[94]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[95]  Reiner Lenz,et al.  Emotion related structures in large image databases , 2010, CIVR '10.

[96]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[97]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[98]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[99]  Ankit Gupta,et al.  Single Image Deblurring Using Motion Density Functions , 2010, ECCV.

[100]  Bora Uçar,et al.  On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe , 2010, SIAM J. Sci. Comput..

[101]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[102]  Thomas S. Huang,et al.  Supervised translation-invariant sparse coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[103]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[104]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[105]  Anton Osokin,et al.  Fast Approximate Energy Minimization with Label Costs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[106]  William T. Freeman,et al.  Analyzing spatially-varying blur , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[107]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[108]  Vikas Sindhwani,et al.  Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference , 2010, NIPS.

[109]  Michael J. Black,et al.  Layered image motion with explicit occlusions, temporal consistency, and depth ordering , 2010, NIPS.

[110]  Antonio Torralba,et al.  Semantic Label Sharing for Learning with Many Categories , 2010, ECCV.

[111]  Charless C. Fowlkes,et al.  Shape-based pedestrian parsing , 2011, CVPR 2011.

[112]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[113]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[114]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[115]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[116]  Yang Wang,et al.  Discriminative figure-centric models for joint action localization and recognition , 2011, 2011 International Conference on Computer Vision.

[117]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[118]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[119]  Yair Weiss,et al.  From learning models of natural image patches to whole image restoration , 2011, 2011 International Conference on Computer Vision.

[120]  Andrew Owens,et al.  Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR 2011.

[121]  Bernhard Schölkopf,et al.  Fast removal of non-uniform camera shake , 2011, 2011 International Conference on Computer Vision.

[122]  Peter V. Gehler,et al.  Learning Output Kernels with Block Coordinate Descent , 2011, ICML.

[123]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.

[124]  Antanas Verikas,et al.  Mining data with random forests: A survey and results of new tests , 2011, Pattern Recognit..

[125]  Lei Zhang,et al.  Image Deblurring and Super-Resolution by Adaptive Sparse Domain Selection and Adaptive Regularization , 2010, IEEE Transactions on Image Processing.

[126]  Luc Van Gool,et al.  Scalable multi-class object detection , 2011, CVPR 2011.

[127]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[128]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[129]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[130]  Christoph H. Lampert,et al.  Learning Multi-View Neighborhood Preserving Projections , 2011, ICML.

[131]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[132]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[133]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[134]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[135]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[136]  Horst Bischof,et al.  Joint motion estimation and segmentation of complex scenes with label costs and occlusion modeling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[137]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[138]  Maks Ovsjanikov,et al.  Functional maps , 2012, ACM Trans. Graph..

[139]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[140]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[141]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[142]  Shuicheng Yan,et al.  Practical low-rank matrix approximation under robust L1-norm , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[143]  Lena Gorelick,et al.  Minimizing Energies with Hierarchical Costs , 2012, International Journal of Computer Vision.

[144]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[145]  Bernt Schiele,et al.  A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[146]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[147]  Yasuyuki Matsushita,et al.  Motion detail preserving optical flow estimation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[148]  Bernt Schiele,et al.  Video Segmentation with Superpixels , 2012, ACCV.

[149]  Michael J. Black,et al.  Layered segmentation and optical flow estimation over time , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[150]  Bernd Jähne,et al.  Outdoor stereo camera system for the generation of real-world benchmark data sets , 2012 .

[151]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[152]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[153]  Xiaoyan Hu,et al.  A Quantitative Evaluation of Confidence Measures for Stereo Vision , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[154]  Aristidis Likas,et al.  Kernel-Based Weighted Multi-view Clustering , 2012, 2012 IEEE 12th International Conference on Data Mining.

[155]  Huchuan Lu,et al.  Visual tracking via adaptive structural local sparse appearance model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[156]  Guillaume-Alexandre Bilodeau,et al.  An iterative integrated framework for thermal-visible image registration, sensor fusion, and people tracking for video surveillance applications , 2012, Comput. Vis. Image Underst..

[157]  Thomas Deselaers,et al.  Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.

[158]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[159]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[160]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[161]  G. Zelinsky,et al.  Eye can read your mind: decoding gaze fixations to reveal categorical search targets. , 2013, Journal of vision.

[162]  Rahul Nair,et al.  Ensemble Learning for Confidence Measures in Stereo Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[163]  Alexander M. Bronstein,et al.  Coupled quasi‐harmonic bases , 2012, Comput. Graph. Forum.

[164]  Andrew Blake,et al.  Efficient Human Pose Estimation from Single Depth Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[165]  Alexandre Bernardino,et al.  Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition , 2013, 2013 IEEE International Conference on Computer Vision.

[166]  Forrest N. Iandola,et al.  Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[167]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[168]  Richard Bowden,et al.  Hollywood 3D: Recognizing Actions in 3D Natural Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[169]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[170]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[171]  Horst Bischof,et al.  Alternating Regression Forests for Object Detection and Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[172]  Li Xu,et al.  Unnatural L0 Sparse Representation for Natural Image Deblurring , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[173]  Thomas Brox,et al.  A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[174]  James A. Reggia,et al.  Robust human action recognition via long short-term memory , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[175]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[176]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[177]  Samuel Berlemont,et al.  BLSTM-RNN Based 3D Gesture Classification , 2013, ICANN.

[178]  C. V. Jawahar,et al.  Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[179]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[180]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[181]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[182]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[183]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[184]  Li Cheng,et al.  Efficient Hand Pose Estimation from a Single Depth Image , 2013, 2013 IEEE International Conference on Computer Vision.

[185]  John J. Leonard,et al.  Robust real-time visual odometry for dense RGB-D mapping , 2013, 2013 IEEE International Conference on Robotics and Automation.

[186]  Thierry Blu,et al.  Multi-Wiener SURE-LET Deconvolution , 2013, IEEE Transactions on Image Processing.

[187]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[188]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[189]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[190]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[191]  Alexei A. Efros,et al.  Mid-level Visual Element Discovery as Discriminative Mode Seeking , 2013, NIPS.

[192]  Björn Stenger,et al.  Demisting the Hough Transform for 3D Shape Recognition and Registration , 2014, International Journal of Computer Vision.

[193]  Michael J. Black,et al.  A Fully-Connected Layered Model of Foreground and Background Flow , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[194]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[195]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[196]  Sunghyun Cho,et al.  Edge-based blur kernel estimation using patch priors , 2013, IEEE International Conference on Computational Photography (ICCP).

[197]  Stefan K. Gehrig,et al.  Exploiting the Power of Stereo Confidences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[198]  Jean Ponce,et al.  Learning to Estimate and Remove Non-uniform Image Blur , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[199]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[200]  Guangming Shi,et al.  Nonlocal Image Restoration With Bilateral Variance Estimation: A Low-Rank Approach , 2013, IEEE Transactions on Image Processing.

[201]  Mubarak Shah,et al.  Spatiotemporal Deformable Part Models for Action Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[202]  Lianhong Cai,et al.  Interpretable aesthetic features for affective image classification , 2013, 2013 IEEE International Conference on Image Processing.

[203]  Graham D. Finlayson,et al.  Corrected-Moment Illuminant Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[204]  Horst Bischof,et al.  Alternating Decision Forests , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[205]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[206]  David J. Fleet,et al.  Computer vision -- ECCV 2014 : 13th European conference Zurich, Switzerland, September 6-12, 2014 : proceedings , 2014 .

[207]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[208]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[209]  Aditya Bhaskara,et al.  Provable Bounds for Learning Some Deep Representations , 2013, ICML.

[210]  Roland Siegwart,et al.  People detection and tracking from aerial thermal views , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[211]  Matthieu Guillaumin,et al.  Incremental Learning of NCM Forests for Large-Scale Image Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[212]  Nikos Komodakis,et al.  Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[213]  Michal Irani,et al.  Blind Deblurring Using Internal Patch Recurrence , 2014, ECCV.

[214]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[215]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[216]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[217]  Margrit Betke,et al.  A Thermal Infrared Video Benchmark for Visual Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[218]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[219]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[220]  Dacheng Tao,et al.  Large-Margin Multi-ViewInformation Bottleneck , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[221]  Jianjiang Feng,et al.  Smooth Representation Clustering , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[222]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[223]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[224]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[225]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[226]  Lei Wang,et al.  Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors , 2014, NIPS.

[227]  Shuo Wang,et al.  Predicting human gaze beyond pixels. , 2014, Journal of vision.

[228]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[229]  Jack J. Dongarra,et al.  Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores , 2014, ICS '14.

[230]  Qi Zhang,et al.  Rolling Guidance Filter , 2014, ECCV.

[231]  Thomas Brox,et al.  Spectral Graph Reduction for Efficient Image and Streaming Video Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[232]  Daniel Thalmann,et al.  Parsing the Hand in Depth Images , 2014, IEEE Transactions on Multimedia.

[233]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[234]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[235]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[236]  Thomas Brox,et al.  Dense Semi-rigid Scene Flow Estimation from RGBD Images , 2014, ECCV.

[237]  Bernt Schiele,et al.  Learning Must-Link Constraints for Video Segmentation Based on Spectral Clustering , 2014, GCPR.

[238]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[239]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[240]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[241]  Ling Shao,et al.  Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[242]  Noah Snavely,et al.  Robust Global Translations with 1DSfM , 2014, ECCV.

[243]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[244]  Tae Hyun Kim,et al.  Segmentation-Free Dynamic Scene Deblurring , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[245]  Konrad Schindler,et al.  VocMatch: Efficient Multiview Correspondence for Structure from Motion , 2014, ECCV.

[246]  Andrew G. Howard,et al.  Some Improvements on Deep Convolutional Neural Network Based Image Classification , 2013, ICLR.

[247]  Limin Wang,et al.  Video Action Detection with Relational Dynamic-Poselets , 2014, ECCV.

[248]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[249]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[250]  T. Tuytelaars,et al.  Weakly Supervised Object Detection with Posterior Regularization , 2014 .

[251]  Tobias Gurdan Sparse Modeling of Intrinsic Correspondences , 2014 .

[252]  Jitendra Malik,et al.  R-CNNs for Pose Estimation and Action Detection , 2014, ArXiv.

[253]  Nojun Kwak,et al.  Efficient $l_{1}$ -Norm-Based Low-Rank Matrix Approximations for Large-Scale Problems Using Alternating Rectified Gradient Method , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[254]  Qing Fei,et al.  Pedestrian Classification and Detection in Far Infrared Images , 2015, ICIRA.

[255]  Ali Borji,et al.  What do eyes reveal about the mind?: Algorithmic inference of search targets from fixations , 2015, Neurocomputing.

[256]  Jan-Michael Frahm,et al.  From single image query to detailed 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[257]  Jan-Michael Frahm,et al.  Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset) , 2015, CVPR 2015.