Saliency moments for image categorization

In this paper we present Saliency Moments, a new, holistic descriptor for image recognition inspired by two biological vision principles: the gist perception and the selective visual attention. While traditional image features extract either local or global discriminative properties from the visual content, we use a hybrid approach that exploits some coarsely localized information, i.e. the salient regions shape and contours, to build a global, low-dimensional image signature. Results show that this new type of image description outperforms the traditional global features on scene and object categorization, for a variety of challenging datasets. Moreover, we show that, when combined with other existing descriptors (SIFT, Color Moments, Wavelet Feature and Edge Histogram), the saliency-based features provide complementary information, improving the precision of a retrieval system we build for the TRECVID 2010.

[1]  Pietro Perona,et al.  On the usefulness of attention for object recognition , 2004 .

[2]  James J. Little,et al.  Informed visual search: Combining attention and object recognition , 2008, 2008 IEEE International Conference on Robotics and Automation.

[3]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[4]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[5]  D. Navon Forest before trees: The precedence of global features in visual perception , 1977, Cognitive Psychology.

[6]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[7]  A. Oliva,et al.  From Blobs to Boundary Edges: Evidence for Time- and Spatial-Scale-Dependent Scene Recognition , 1994 .

[8]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  B. S. Manjunath,et al.  MPEG‐7 Homogeneous Texture Descriptor , 2001 .

[10]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11]  M. Tarr,et al.  Visual Object Recognition , 1996, ISTCS.

[12]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  J. Henderson,et al.  The influence of color on the perception of scene gist. , 2008, Journal of experimental psychology. Human perception and performance.

[15]  C. Won,et al.  Efficient Use of MPEG‐7 Edge Histogram Descriptor , 2002 .

[16]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[17]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[19]  Irving Biederman,et al.  Visual object recognition , 1993 .

[20]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[21]  A. Oliva,et al.  Diagnostic Colors Mediate Scene Recognition , 2000, Cognitive Psychology.

[22]  Frédéric Jurie,et al.  Learning Saliency Maps for Object Categorization , 2006 .

[23]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[24]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[25]  Sue Harding,et al.  Auditory Gist Perception: An Alternative to Attentional Selection of Auditory Streams? , 2008, WAPCV.

[26]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[27]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[29]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[30]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[31]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[32]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[33]  PoggioTomaso,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007 .

[34]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  George A. Alvarez,et al.  Gist perception requires attention , 2010 .

[36]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[37]  Lucas Paletta,et al.  Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint , 2008, Lecture Notes in Computer Science.

[38]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.