Interactive Hierarchical Object Proposals

Object proposal algorithms have been demonstrated to be very successful in accelerating object detection process. High object localization quality and detection recall can be obtained using thousands of proposals. However, the performance with a small number of proposals is still unsatisfactory. This paper demonstrates that the performance of a few proposals can be significantly improved with the minimal human interaction—a single touch point. To this end, we first generate hierarchical superpixels using an efficient tree-organized structure as our initial object proposals, and then select only a few proposals from them by learning an effective Convolutional neural network for objectness ranking. We explore and design an architecture to integrate human interaction with the global information of the whole image for objectness scoring, which is able to significantly improve the performance with a minimum number of object proposals. Extensive experiments show the proposed method outperforms all the state-of-the-art methods for locating the meaningful object with the touch point constraint. Furthermore, the proposed method is extended for video. By combining with the novel interactive motion segmentation cue for generating hierarchical superpixels, the performance on a single proposal is satisfactory and can be used in the interactive vision systems, such as selecting the input of a real-time tracking system.

[1]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[2]  Lena Gorelick,et al.  GrabCut in One Cut , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Gang Wang,et al.  Tree Filtering: Efficient Structure-Preserving Smoothing With a Minimum Spanning Tree , 2014, IEEE Transactions on Image Processing.

[4]  James M. Rehg,et al.  RIGOR: Reusing Inference in Graph Cuts for Generating Object Regions , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Javier Sánchez Pérez,et al.  TV-L1 Optical Flow Estimation , 2013, Image Process. Line.

[6]  Zhuowen Tu,et al.  MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[8]  Scott Cohen,et al.  LIVEcut: Learning-based interactive video segmentation by evaluation of multiple propagated cues , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Punam K. Saha,et al.  Efficient algorithm for finding the exact minimum barrier distance , 2014, Comput. Vis. Image Underst..

[10]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Jitendra Malik,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Xuelong Li,et al.  Learning Sampling Distributions for Efficient Object Detection , 2015, IEEE Transactions on Cybernetics.

[13]  Xuelong Li,et al.  Saliency Detection by Multiple-Instance Learning , 2013, IEEE Transactions on Cybernetics.

[14]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Cewu Lu,et al.  Complexity-adaptive distance metric for object proposals generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Andrew Blake,et al.  GeoS: Geodesic Image Segmentation , 2008, ECCV.

[17]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  James M. Rehg,et al.  The Middle Child Problem: Revisiting Parametric Min-Cut and Seeds for Object Proposals , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Huimin Ma,et al.  Improving object proposals with multi-thresholding straddling expansion , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Tao Xiang,et al.  Making better use of edges via perceptual grouping , 2015, CVPR.

[25]  Punam K. Saha,et al.  The minimum barrier distance , 2013, Comput. Vis. Image Underst..

[26]  Derek Hoiem,et al.  Category-Independent Object Proposals with Diverse Ranking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  M. Tomasello,et al.  A new look at infant pointing. , 2007, Child development.

[28]  James M. Rehg,et al.  Combining Self Training and Active Learning for Video Segmentation , 2011, BMVC.

[29]  Neelima Chavali,et al.  Object-Proposal Evaluation Protocol is ‘Gameable’ , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Sun,et al.  Geodesic Saliency Using Background Priors , 2012, ECCV.

[31]  Pingkun Yan,et al.  Visual Saliency by Selective Contrast , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Radomír Mech,et al.  Minimum Barrier Salient Object Detection at 80 FPS , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[35]  Nicu Sebe,et al.  Learning to Group Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Jitendra Malik,et al.  DeepBox: Learning Objectness with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Jonathan Warrell,et al.  Proposal generation for object detection using cascaded ranking SVMs , 2011, CVPR 2011.

[38]  Jian Sun,et al.  Saliency Optimization from Robust Background Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Vladlen Koltun,et al.  Learning to propose objects , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[43]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[45]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[46]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[47]  Simone Palazzo,et al.  Gamifying Video Object Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Gang Wang,et al.  Spatiotemporal Background Subtraction Using Minimum Spanning Tree and Optical Flow , 2014, ECCV.

[49]  Bohyung Han,et al.  Modeling and segmentation of floating foreground and background in videos , 2012, Pattern Recognit..

[50]  Matthew B. Blaschko,et al.  Learning a category independent object detection cascade , 2011, 2011 International Conference on Computer Vision.

[51]  Guillermo Sapiro,et al.  Interactive Image Segmentation via Adaptive Weighted Distances , 2007, IEEE Transactions on Image Processing.

[52]  Bo Han,et al.  TouchCut: Fast image and video segmentation using single-touch interaction , 2014, Comput. Vis. Image Underst..

[53]  Hao Su,et al.  Crowdsourcing Annotations for Visual Object Detection , 2012, HCOMP@AAAI.

[54]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[55]  Cewu Lu,et al.  Contour Box: Rejecting Object Proposals without Explicit Closed Contours , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  Michal Irani,et al.  What Is a Good Image Segment? A Unified Approach to Segment Extraction , 2008, ECCV.

[57]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Toby Sharp,et al.  Image segmentation with a bounding box prior , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[59]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[60]  Thomas Brox,et al.  Video Segmentation with Just a Few Strokes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[62]  Shao-Yi Chien,et al.  Real-Time Salient Object Detection with a Minimum Spanning Tree , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Liqing Zhang,et al.  Object proposal by multi-branch hierarchical segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Gabriele Facciolo,et al.  TV-L 1 Optical Flow Estimation , 2013 .

[66]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[67]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[68]  Andrew Blake,et al.  Geodesic star convexity for interactive image segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[69]  Ismail Ben Ayed,et al.  Pseudo-bound Optimization for Binary Energies , 2014, ECCV.

[70]  Huijun Di,et al.  Background modeling from a free-moving camera by Multi-Layer Homography Algorithm , 2008, 2008 15th IEEE International Conference on Image Processing.

[71]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[72]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[73]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[74]  Guillermo Sapiro,et al.  Geodesic Matting: A Framework for Fast Interactive Image and Video Segmentation and Matting , 2009, International Journal of Computer Vision.

[75]  Bohyung Han,et al.  Generalized background subtraction based on hybrid inference by belief propagation and Bayesian filtering , 2011, 2011 International Conference on Computer Vision.

[76]  James W. Davis,et al.  A Multi-transformational Model for Background Subtraction with Moving Cameras , 2014, ECCV.

[77]  Yihong Gong,et al.  Superpixel Hierarchy , 2016, IEEE Transactions on Image Processing.

[78]  Esa Rahtu,et al.  Generating Object Segmentation Proposals Using Global and Local Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[79]  Qi Wang,et al.  Tracking as a Whole: Multi-Target Tracking by Modeling Group Behavior With Sequential Detection , 2017, IEEE Transactions on Intelligent Transportation Systems.