论文信息 - Contextual Priming and Feedback for Faster R-CNN

Contextual Priming and Feedback for Faster R-CNN

The field of object detection has seen dramatic performance improvements in the last few years. Most of these gains are attributed to bottom-up, feedforward ConvNet frameworks. However, in case of humans, top-down information, context and feedback play an important role in doing object detection. This paper investigates how we can incorporate top-down information and feedback in the state-of-the-art Faster R-CNN framework. Specifically, we propose to: (a) augment Faster R-CNN with a semantic segmentation network; (b) use segmentation for top-down contextual priming; (c) use segmentation to provide top-down iterative feedback using two stage training. Our results indicate that all three contributions improve the performance on object detection, semantic segmentation and region proposal generation.

Abhinav Gupta | Abhinav Shrivastava | A. Gupta | Abhinav Shrivastava

[1] Zhuowen Tu,et al. Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Martial Hebert,et al. Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Sanja Fidler,et al. Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Kavita Bala,et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[9] Martial Hebert,et al. Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[10] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[11] Sanja Fidler,et al. Bottom-Up Segmentation for Top-Down Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Xinlei Chen,et al. Enriching Visual Knowledge Bases via Object Discovery and Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Subhransu Maji,et al. Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[15] Derek Hoiem,et al. Category Independent Object Proposals , 2010, ECCV.

[16] Wei Liu,et al. ParseNet: Looking Wider to See Better , 2015, ArXiv.

[17] Raquel Urtasun,et al. Fully Connected Deep Structured Networks , 2015, ArXiv.

[18] Serge J. Belongie,et al. Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..

[19] Sanja Fidler,et al. The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Paul A. Viola,et al. Robust Real-time Object Detection , 2001 .

[21] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[22] Jitendra Malik,et al. Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23] C. Gilbert,et al. Brain States: Top-Down Influences in Sensory Processing , 2007, Neuron.

[24] Nikos Komodakis,et al. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[26] Cordelia Schmid,et al. Segmentation Driven Object Detection with Fisher Vectors , 2013, 2013 IEEE International Conference on Computer Vision.

[27] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[28] Antonio Torralba,et al. Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[29] Irving Biederman,et al. On the Semantics of a Glance at a Scene , 2017 .

[30] Dwight J. Kravitz,et al. The ventral visual pathway: an expanded neural framework for the processing of object quality , 2013, Trends in Cognitive Sciences.

[31] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[32] Jodi L. Davenport,et al. Scene Consistency in Object and Background Perception , 2004, Psychological science.

[33] Ronan Collobert,et al. Learning to Segment Object Candidates , 2015, NIPS.

[34] Thomas Deselaers,et al. Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Antonio Torralba,et al. Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[36] Tim Curran,et al. The Limits of Feedforward Vision: Recurrent Processing Promotes Robust Object Recognition when Objects Are Degraded , 2012, Journal of Cognitive Neuroscience.

[37] Joost van de Weijer,et al. Unrolling Loopy Top-Down Semantic Feedback in Convolutional Deep Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[38] Dumitru Erhan,et al. Deep Neural Networks for Object Detection , 2013, NIPS.

[39] D. J. Felleman,et al. Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[40] Guosheng Lin,et al. Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Gregory Shakhnarovich,et al. Feedforward semantic segmentation with zoom-out features , 2014, CVPR.

[42] H. Hock,et al. Contextual relations: The influence of familiarity, physical plausibility, and belongingness , 1974 .

[43] Jitendra Malik,et al. Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Jonathan T. Barron,et al. Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45] tephen E. Palmer. The effects of contextual scenes on the identification of objects , 1975, Memory & cognition.

[46] Alexei A. Efros,et al. An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[47] J. M. Hupé,et al. Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons , 1998, Nature.

[48] Ming Yang,et al. Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[49] Antonio Torralba,et al. Statistical Context Priming for Object Detection , 2001, ICCV.

[50] J. Henderson,et al. Does consistent scene context facilitate object perception? , 1998, Journal of experimental psychology. General.

[51] A. Torralba,et al. The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[52] Cristian Sminchisescu,et al. Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53] Jitendra Malik,et al. Exploring Person Context and Local Scene Context for Object Detection , 2015, ArXiv.

[54] Jitendra Malik,et al. Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55] Scott T. Grafton,et al. Reductions in neural activity underlie behavioral components of repetition priming , 2005, Nature Neuroscience.

[56] Yingfang Meng,et al. Neural processing of recollection, familiarity and priming at encoding: Evidence from a forced-choice recognition paradigm , 2014, Brain Research.

[57] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[58] M. Chun,et al. Top-Down Attentional Guidance Based on Implicit Learning of Visual Covariation , 1999 .

[59] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60] Jürgen Schmidhuber,et al. Deep Networks with Internal Selective Attention through Feedback Connections , 2014, NIPS.

[61] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[62] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[63] Jian Dong,et al. Towards Unified Object Detection and Semantic Segmentation , 2014, ECCV.

[64] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[65] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[66] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[67] Thomas Deselaers,et al. What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68] Sanja Fidler,et al. segDeepM: Exploiting segmentation and context in deep neural networks for object detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70] Andrea Vedaldi,et al. Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[71] Philip H. S. Torr,et al. What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[72] V. Lamme,et al. The distinct modes of vision offered by feedforward and recurrent processing , 2000, Trends in Neurosciences.

[73] Antonio Torralba,et al. Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[74] Jitendra Malik,et al. Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75] Jitendra Malik,et al. Iterative Instance Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[77] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[78] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[79] Jitendra Malik,et al. Simultaneous Detection and Segmentation , 2014, ECCV.

[80] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81] E Tulving,et al. Priming and human memory systems. , 1990, Science.

[82] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84] Dumitru Erhan,et al. Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.