A Bottom-Up Approach for Learning Visual Object Detection Models from Unreliable Sources

The ability to learn models of computational vision from sample data has significantly advanced the field. Obtaining suitable training image sets, however, remains a challenging problem. In this paper we propose a bottom-up approach for learning object detection models from weakly annotated samples, i.e., only category labels are given per image. By combining visual saliency and distinctiveness of local image features regions of interest are extracted in a completely automatic way without requiring detailed annotations. Using a bag-of-features representation of these regions, object recognition models can be trained for the given object categories. As weakly labeled sample images can easily be obtained from image search engines, our approach does not require any manual annotation effort. Experiments on data from the Visual Object Classes Challenge 2011 show that promising object detection results can be achieved by our proposed method.

[1]  Huilin Xiong,et al.  Object Detection and Localization Using Random Forest , 2012, 2012 Second International Conference on Intelligent System Design and Engineering Application.

[2]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[3]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[4]  Pietro Perona,et al.  Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition , 2007, International Journal of Computer Vision.

[5]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Antonio Torralba,et al.  Unsupervised Detection of Regions of Interest Using Iterative Link Analysis , 2009, NIPS.

[8]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[9]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[10]  Daniel P. Huttenlocher,et al.  Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.

[11]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[12]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[14]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[15]  S. Metlapally,et al.  The effect of positive lens defocus on ocular growth and emmetropization in the tree shrew. , 2008, Journal of vision.

[16]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Laurent Itti,et al.  Interesting objects are visually salient. , 2008, Journal of vision.

[18]  Cordelia Schmid,et al.  Learning Color Names from Real-World Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.