Object detection at multiple scales improves accuracy

For detecting objects in natural visual scenes, several powerful image features have been proposed which can collectively be described as spatial histograms of oriented energy. The HoG [3], HMAX C1 [12], SIFT [10], and shape context feature [2] all represent an input image using with a discrete set of bins which accumulate evidence for oriented structures over a spatial region and a range of orientations. In this work, we generalize these techniques to allow for a foveated input image, rather than a rectilinear raster in order to improve object detection accuracy. The system leverages a spectrum of image measurements, from sharp, fine-scale image sampling within a small spatial region to coarse-scale sampling of a wide field of view. In the experiments we show that features generated from the foveated input format produce detectors of greater accuracy, as measured for four object types from commonly available data-sets.

[1]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[2]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[3]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[4]  Lior Wolf,et al.  A Critical View of Context , 2006, International Journal of Computer Vision.

[5]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[6]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  Ales Leonardis,et al.  Context Driven Focus of Attention for Object Detection , 2008, WAPCV.

[12]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Takeo Kanade,et al.  A statistical approach to 3d object detection applied to faces and cars , 2000 .

[15]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[16]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[17]  PoggioTomaso,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007 .

[18]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[19]  Kunihiko Fukushima Neocognitron capable of incremental learning , 2004, Neural Networks.

[20]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .