论文信息 - Multi-scale volumes for deep object detection and localization

Multi-scale volumes for deep object detection and localization

This study aims to analyze the benefits of improved multi-scale reasoning for object detection and localization with deep convolutional neural networks. To that end, an efficient and general object detection framework which operates on scale volumes of a deep feature pyramid is proposed. In contrast to the proposed approach, most current state-of-the-art object detectors operate on a single-scale in training, while testing involves independent evaluation across scales. One benefit of the proposed approach is in better capturing of multi-scale contextual information, resulting in significant gains in both detection performance and localization quality of objects on the PASCAL VOC dataset and a multi-view highway vehicles dataset. The joint detection and localization scale-specific models are shown to especially benefit detection of challenging object categories which exhibit large scale variation as well as detection of small objects. HighlightsMulti-scale feature reasoning for deep object detection in images is analyzed.A multi-scale contextual reasoning approach is proposed using multi-scale volumes.Scale-specific, joint detection and localization models increase robustness.The approach efficiently handles challenging cases of large variation in scale.

Mohan M. Trivedi | Eshed Ohn-Bar

[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2] Rodrigo Nakamura,et al. Improving land cover classification through contextual-based optimum-path forest , 2015, Inf. Sci..

[3] Mohan M. Trivedi,et al. Fast and Robust Object Detection Using Visual Subcategories , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[4] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[5] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6] Mohan M. Trivedi,et al. Learning to Detect Vehicles by Clustering Appearance Patterns , 2015, IEEE Transactions on Intelligent Transportation Systems.

[7] Charless C. Fowlkes,et al. Multiresolution Models for Object Detection , 2010, ECCV.

[8] Zhuowen Tu,et al. Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Jitendra Malik,et al. Training Deformable Part Models with Decorrelated Features , 2013, 2013 IEEE International Conference on Computer Vision.

[10] Christoph H. Lampert,et al. Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[11] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[12] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[13] Jing Xiao,et al. Detection Evolution with Multi-order Contextual Co-occurrence , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Jitendra Malik,et al. Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Mohan M. Trivedi,et al. Towards Semantic Understanding of Surrounding Vehicular Maneuvers: A Panoramic Vision-Based Framework for Real-World Highway Studies , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16] Li Wan,et al. End-to-end integration of a Convolutional Network, Deformable Parts Model and non-maximum suppression , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Alexei A. Efros,et al. Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18] Xiang Bai,et al. Script identification in the wild via discriminative convolutional neural network , 2016, Pattern Recognit..

[19] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Yann LeCun,et al. Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[21] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[22] Mark W. Schmidt,et al. Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[23] Song-Chun Zhu,et al. Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model , 2014, ECCV.

[24] Mohan M. Trivedi,et al. Looking at Pedestrians at Different Scales: A Multiresolution Approach and Evaluations , 2016, IEEE Transactions on Intelligent Transportation Systems.

[25] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[26] Derek Hoiem,et al. Diagnosing Error in Object Detectors , 2012, ECCV.

[27] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[28] Pietro Perona,et al. Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[30] Bernt Schiele,et al. What Is Holding Back Convnets for Detection? , 2015, GCPR.

[31] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[32] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34] Luc Van Gool,et al. Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Gang Hua,et al. Accurate Object Detection with Location Relaxation and Regionlets Re-localization , 2014, ACCV.

[36] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.

[37] Yann LeCun,et al. Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Jing Xiao,et al. Contextual boost for pedestrian detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Mohan M. Trivedi,et al. Multi-perspective vehicle detection and tracking: Challenges, dataset, and metrics , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[40] Bin Yang,et al. Convolutional Channel Features , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[42] Dumitru Erhan,et al. Deep Neural Networks for Object Detection , 2013, NIPS.

[43] Charless C. Fowlkes,et al. Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[44] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[45] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46] GeigerA,et al. Vision meets robotics , 2013 .

[47] Ali Farhadi,et al. Recognition using visual phrases , 2011, CVPR 2011.

[48] Carsten Rother,et al. Learning discriminative localization from weakly labeled data , 2014, Pattern Recognit..

[49] David A. Forsyth,et al. 30Hz Object Detection with DPM V5 , 2014, ECCV.

[50] Iasonas Kokkinos,et al. Deformable Part Models with CNN Features , 2014, ECCV 2014.

[51] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52] Forrest N. Iandola,et al. Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[53] Zhuowen Tu,et al. Fixed-Point Model For Structured Labeling , 2013, ICML.

[54] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[55] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[56] Steve Branson,et al. Efficient Large-Scale Structured Learning , 2013, CVPR.

[57] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58] Wei Zhang,et al. Real-time Accurate Object Detection using Multiple Resolutions , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[59] Gang Wang,et al. Exemplar based Deep Discriminative and Shareable Feature Learning for scene image classification , 2015, Pattern Recognit..