OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object boundaries. Bounding boxes are then accumulated rather than suppressed in order to increase detection confidence. We show that different tasks can be learned simultaneously using a single shared network. This integrated framework is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very competitive results for the detection and classifications tasks. In post-competition work, we establish a new state of the art for the detection task. Finally, we release a feature extractor from our best model called OverFeat.

[1]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yann LeCun,et al.  Toward automatic phenotyping of developing embryos from videos , 2005, IEEE Transactions on Image Processing.

[3]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[5]  Yann LeCun,et al.  Learning long‐range vision for autonomous off‐road driving , 2009, J. Field Robotics.

[6]  John C. Platt,et al.  A Convolutional Neural Network Hand Tracker , 1994, NIPS.

[7]  Lawrence D. Jackel,et al.  Reading handwritten digits: a ZIP code recognition system , 1992, Computer.

[8]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[10]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[11]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[13]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[14]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[17]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[18]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[19]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[20]  Christoph Bregler,et al.  Pose-Sensitive Embedding by Nonlinear NCA Regression , 2010, NIPS.

[21]  Luca Maria Gambardella,et al.  Fast image scanning with deep max-pooling convolutional neural networks , 2013, 2013 IEEE International Conference on Image Processing.

[22]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[23]  Christophe Garcia,et al.  text Detection with Convolutional Neural Networks , 2008, VISAPP.

[24]  Joseph F. Murray,et al.  Supervised Learning of Image Restoration with Convolutional Networks , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Cristian Sminchisescu,et al.  Object Recognition by Sequential Figure-Ground Ranking , 2012, International Journal of Computer Vision.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  R. Vaillant,et al.  An original approach for the localization of objects in images , 1993 .

[30]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.