Object detection in 20 questions

We propose a novel general strategy for object detection. Instead of passively evaluating all object detectors at all possible locations in an image, we develop a divide-and-conquer approach by actively and sequentially evaluating contextual cues related to the query based on the scene and previous evaluations - like playing a "20 Questions" game - to decide where to search for the object. We formulate the problem as a Markov Decision Process and learn a search policy by reinforcement learning. To demonstrate the efficacy of our generic algorithm, we apply the 20 questions approach in the recent framework of simultaneous object detection and segmentation. Experimental results on the Pascal VOC dataset show that our algorithm reduces about 45.3% of the object proposals and 36% of average evaluation time while achieving better average precision compared to exhaustive search.

[1]  Daphne Koller,et al.  Active Classification based on Value of Classifier , 2011, NIPS.

[2]  Marc'Aurelio Ranzato,et al.  On Learning Where To Look , 2014, ArXiv.

[3]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[4]  Pushmeet Kohli,et al.  Graph Cut Based Inference with Co-occurrence Statistics , 2010, ECCV.

[5]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[6]  Andrea Vedaldi,et al.  R-CNN minus R , 2015, BMVC.

[7]  Yee Whye Teh,et al.  Searching for objects driven by context , 2012, NIPS.

[8]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[9]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Trevor Darrell,et al.  Timely Object Recognition , 2012, NIPS.

[11]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[12]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[13]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  I. Biederman,et al.  Scene perception: Detecting and judging objects undergoing relational violations , 1982, Cognitive Psychology.

[15]  Wilson S. Geisler,et al.  Optimal eye movement strategies in visual search , 2005, Nature.

[16]  D. Geman,et al.  Hierarchical testing designs for pattern recognition , 2005, math/0507421.

[17]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[18]  Svetlana Lazebnik,et al.  Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[21]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Dima Damen,et al.  Proceedings of the British Machine Vision Conference , 2014, BMVC 2014.

[25]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[29]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[30]  Tsuhan Chen,et al.  Exploring Tiny Images: The Roles of Appearance and Contextual Information for Machine and Human Object Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  L. Chelazzi,et al.  Associative knowledge controls deployment of visual selective attention , 2003, Nature Neuroscience.

[32]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.