Efficient Subwindow Search: A Branch and Bound Framework for Object Localization

Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To estimate the object's location, one can take a sliding window approach, but this strongly increases the computational cost because the classifier or similarity function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branch and bound scheme that allows efficient maximization of a large class of quality functions over all possible subimages. It converges to a globally optimal solution typically in linear or even sublinear time, in contrast to the quadratic scaling of exhaustive or sliding window search. We show how our method is applicable to different object detection and image retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest-neighbor classifiers based on the lambda2 distance. We demonstrate state-of-the-art localization performance of the resulting systems on the UIUC Cars data set, the PASCAL VOC 2006 data set, and in the PASCAL VOC 2007 competition.

[1]  E. L. Lawler,et al.  Branch-and-Bound Methods: A Survey , 1966, Oper. Res..

[2]  Jan Karel Lenstra,et al.  BRANCHING FROM THE LARGEST UPPER BOUND: FOLKLORE AND FACTS , 1978 .

[3]  Thomas M. Breuel,et al.  Fast recognition using adaptive subdivisions of transformation space , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Teodor Gabriel Crainic,et al.  PARALLEL BRANCH-AND-BOUND ALGORITHMS: SURVEY AND SYNTHESIS , 1993 .

[5]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Bernard Gendron,et al.  Parallel Branch-and-Branch Algorithms: Survey and Synthesis , 1994, Oper. Res..

[7]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[8]  Bernt Schiele,et al.  Object Recognition Using Multidimensional Receptive Field Histograms , 1996, ECCV.

[9]  David M. Mount,et al.  Efficient algorithms for robust feature matching , 1999, Pattern Recognit..

[10]  Frédéric Jurie,et al.  Solution of the Simultaneous Pose and Correspondence Problem Using Gaussian Error Model , 1999, Comput. Vis. Image Underst..

[11]  Andrew Zisserman,et al.  Viewpoint invariant texture matching and wide baseline stereo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[12]  M. H. van Emden,et al.  Interval arithmetic: From principles to implementation , 2001, JACM.

[13]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[14]  Clark F. Olson,et al.  Locating geometric primitives by pruning the parameter space , 2001, Pattern Recognit..

[15]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[16]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Thomas M. Breuel On the use of interval arithmetic in geometric branch and bound algorithms , 2003, Pattern Recognit. Lett..

[18]  Francesca Odone,et al.  Histogram intersection kernel for image classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[19]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[20]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[21]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Remco C. Veltkamp,et al.  Reliable and Efficient Pattern Matching Using an Affine Invariant Metric , 1999, International Journal of Computer Vision.

[23]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Bernt Schiele,et al.  Integrating representative and discriminant models for object category detection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Nozha Boujemaa,et al.  Generalized histogram intersection kernel for image recognition , 2005, IEEE International Conference on Image Processing 2005.

[26]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[28]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Jorma Laaksonen,et al.  Techniques for Still Image Scene Classification and Object Detection , 2006, ICANN.

[30]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[33]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[34]  Frédéric Jurie,et al.  Groups of Adjacent Contour Segments for Object Detection , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[37]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Ivan Laptev,et al.  Improving object detection with boosted histograms , 2009, Image Vis. Comput..

[39]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[40]  Trevor Darrell,et al.  Fast concurrent object localization and recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Pavel Vácha,et al.  Query by Pictorial Example , 2011 .