Learning to Detect a Salient Object

In this paper, we study the salient object detection problem for images. We formulate this problem as a binary labeling task where we separate the salient object from the background. We propose a set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, to describe a salient object locally, regionally, and globally. A conditional random field is learned to effectively combine these features for salient object detection. Further, we extend the proposed approach to detect a salient object from sequential images by introducing the dynamic salient features. We collected a large image database containing tens of thousands of carefully labeled images by multiple users and a video segment database, and conducted a set of experiments over them to demonstrate the effectiveness of the proposed approach.

[1]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[2]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[3]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[4]  C. Koch,et al.  Models of bottom-up and top-down visual attention , 2000 .

[5]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  HongJiang Zhang,et al.  A model of motion attention for video skimming , 2002, Proceedings. International Conference on Image Processing.

[10]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[11]  Xavier Cufí,et al.  Yet Another Survey on Image Segmentation: Region and Boundary Information Integration , 2002, ECCV.

[12]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[13]  Xing Xie,et al.  A visual attention model for adapting images on small displays , 2003, Multimedia Systems.

[14]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[15]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[17]  Patrick Pérez,et al.  Interactive Image Segmentation Using an Adaptive GMMRF Model , 2004, ECCV.

[18]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[19]  Harry Shum,et al.  Video object cut and paste , 2005, ACM Trans. Graph..

[20]  Jitendra Malik,et al.  Cue Integration for Figure/Ground Labeling , 2005, NIPS.

[21]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[22]  Pierre Baldi,et al.  A principled approach to detecting surprising events in video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Marc Parizeau,et al.  Incremental discovery of object parts in video sequences , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Mubarak Shah,et al.  Visual attention detection in video sequences using spatiotemporal cues , 2006, MM '06.

[27]  Michael Gleicher,et al.  Region Enhanced Scale-Invariant Saliency Detection , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[28]  Nebojsa Jojic,et al.  Escaping local minima through hierarchical model selection: Automatic object discovery, segmentation, and tracking in video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  David Salesin,et al.  Gaze-based interaction for semi-automatic photo cropping , 2006, CHI.

[30]  Bernt Schiele,et al.  Segmentation Based Multi-Cue Integration for Object Detection , 2006, BMVC.

[31]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  Michael Gleicher,et al.  Video retargeting: automating pan and scan , 2006, MM '06.

[33]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  L. Itti,et al.  Visual causes versus correlates of attentional selection in dynamic scenes , 2006, Vision Research.

[35]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[37]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[38]  Tsuhan Chen,et al.  A Topic-Motion Model for Unsupervised Video Object Discovery , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Alexandre Bur,et al.  Dynamic visual attention: competitive versus motion priority scheme , 2007, ICVS 2007.

[40]  Anat Levin,et al.  Learning to Combine Bottom-Up and Top-Down Segmentation , 2006, International Journal of Computer Vision.

[41]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Laurent Itti,et al.  Interesting objects are visually salient. , 2008, Journal of vision.

[43]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[44]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[45]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, SIGGRAPH 2009.

[46]  Pieter Peers,et al.  SubEdit: a representation for editing measured heterogeneous subsurface scattering , 2009, SIGGRAPH 2009.

[47]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[48]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.