Learning to Co-Generate Object Proposals with a Deep Structured Network

Generating object proposals has become a key component of modern object detection pipelines. However, most existing methods generate the object candidates independently of each other. In this paper, we present an approach to co-generating object proposals in multiple images, thus leveraging the collective power of multiple object candidates. In particular, we introduce a deep structured network that jointly predicts the objectness scores and the bounding box locations of multiple object candidates. Our deep structured network consists of a fully-connected Conditional Random Field built on top of a set of deep Convolutional Neural Networks, which learn features to model both the individual object candidates and the similarity between multiple candidates. To train our deep structured network, we develop an end-to-end learning algorithm that, by unrolling the CRF inference procedure, lets us backpropagate the loss gradient throughout the entire structured network. We demonstrate the effectiveness of our approach on two benchmark datasets, showing significant improvement over state-of-the-art object proposal algorithms.

[1]  Vladlen Koltun,et al.  Parameter Learning and Convergent Inference for Dense Random Fields , 2013, ICML.

[2]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[3]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[4]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[5]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[6]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[7]  Dong Liu,et al.  Robust Object Co-detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2019, Computational Visual Media.

[11]  Xuming He,et al.  Object Co-detection via Efficient Inference in a Fully-Connected CRF , 2014, ECCV.

[12]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[13]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  Xuming He,et al.  Structural Kernel Learning for Large Scale Multiclass Object Co-detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Silvio Savarese,et al.  Object Co-detection , 2012, ECCV.

[18]  Raquel Urtasun,et al.  Fully Connected Deep Structured Networks , 2015, ArXiv.

[19]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[20]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[21]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[22]  Renjie Liao,et al.  CoDeL: A Human Co-detection and Labeling Framework , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Niloy J. Mitra,et al.  Object Proposals Estimation in Depth Image Using Compact 3D Shape Manifolds , 2015, GCPR.

[27]  Alan L. Yuille,et al.  Learning Deep Structured Models , 2014, ICML.

[28]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.