Finding Objects of Interest in Images using Saliency and Superpixels

The ability to automatically find objects of interest in images is useful in the areas of compression, indexing and retrieval, re-targeting, and so on. There are two classes of such algorithms – those that find any object of interest with no prior knowledge, independent of the task, and those that find specific objects of interest known a priori. The former class of algorithms tries to detect objects in images that stand-out, i.e. are salient, by virtue of being different from the rest of the image and consequently capture our attention. The detection is generic in this case as there is no specific object we are trying to locate. The latter class of algorithms detects specific known objects of interest and often requires training using features extracted from known examples. In this thesis we address various aspects of finding objects of interest under the topics of saliency detection and object detection. We present two saliency detection algorithms that rely on the principle of center-surround contrast. These two algorithms are shown to be superior to several state-of-the-art techniques in terms of precision and recall measures with respect to a ground truth. They output full-resolution saliency maps, are simpler to implement, and are computationally more efficient than most existing algorithms. We further establish the relevance of our saliency detection algorithms by using them for the known applications of object segmentation and image re-targeting. We first present three different techniques for salient object segmentation using our saliency maps that are based on clustering, graph-cuts, and geodesic distance based labeling. We then demonstrate the use of our saliency maps for a popular technique of content-aware image resizing and compare the result with that of existing methods. Our saliency maps prove to be a much more effective replacement for conventional gradient maps for providing automatic content-awareness. Just as it is important to find regions of interest in images, it is also important to find interesting images within a large collection of images. We therefore extend the notion of saliency detection in images to image databases. We propose an algorithm for finding salient images in a database. Apart from finding such images we also present two novel techniques for creating visually appealing summaries in the form of collages and mosaics. Finally, we address the problem of finding specific known objects of interest in images. Specifically, we deal with the feature extraction step that is a pre-requisite for any technique in this domain. In this context, we first present a superpixel segmentation algorithm that outperforms previous algorithms in terms quantitative measures of under-segmentation error and boundary recall. Our superpixel segmentation algorithm also offers several other advantages over existing algorithms like compactness, uniform size, control on the number of superpixels, and computational efficiency. We prove the effectiveness of our superpixels by deploying them in existing algorithms, specifically, an object class detection technique and a graph based algorithm, and improving their performance. We also present the result of using our superpixels in a technique for detecting mitochondria in noisy medical images.

[1]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956 .

[2]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[3]  Richard M. Karp,et al.  Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[4]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[5]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[6]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[7]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[8]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[9]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[10]  Shimon Ullman,et al.  Structural Saliency: The Detection Of Globally Salient Structures using A Locally Connected Network , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[11]  Tony Lindeberg Scale-space for discrete images , 1989 .

[12]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Max A. Viergever,et al.  Scale and the differential structure of images , 1992, Image Vis. Comput..

[14]  R. Deriche Recursively Implementing the Gaussian and its Derivatives , 1993 .

[15]  D. V. van Essen,et al.  A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[16]  T. Lindeberg,et al.  Scale-Space Theory : A Basic Tool for Analysing Structures at Different Scales , 1994 .

[17]  M. Inaba Application of weighted Voronoi diagrams and randomization to variance-based k-clustering , 1994, SoCG 1994.

[18]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[20]  Lucas J. van Vliet,et al.  Recursive implementation of the Gaussian filter , 1995, Signal Process..

[21]  Wen-Hsiang Tsai,et al.  Moment-preserving thresholding: a new approach , 1995 .

[22]  S. Grossberg The Attentive Brain , 1995 .

[23]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[24]  Peter G. B. Enser,et al.  Progress in Documentation Pictorial Information Retrieval , 1995, J. Documentation.

[25]  Ramin Zabih,et al.  Histogram refinement for content-based image retrieval , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[26]  Wayne Luk,et al.  Binomial filters , 1996, J. VLSI Signal Process..

[27]  Peter G. B. Enser,et al.  Analysis of user need in image archives , 1997, J. Inf. Sci..

[28]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[29]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  David S. Doermann,et al.  Automatic text tracking in digital videos , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[31]  Shyang Chang,et al.  Statistical change detection with moments under time-varying illumination , 1998, IEEE Trans. Image Process..

[32]  Christof Koch,et al.  Comparison of feature combination strategies for saliency-based visual attention systems , 1999, Electronic Imaging.

[33]  Ronald A. Rensink The Dynamic Representation of Scenes , 2000 .

[34]  Ronald A. Rensink Seeing, sensing, and scrutinizing , 2000, Vision Research.

[35]  Wolfgang Effelsberg,et al.  Automatic text segmentation and text recognition for video indexing , 2000, Multimedia Systems.

[36]  Murat Kunt,et al.  ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE , 2000 .

[37]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[38]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[39]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[40]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[41]  Xian-Sheng Hua,et al.  A video text detection and recognition system , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[42]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[43]  Peter Meer,et al.  Synergism in low level vision , 2002, Object recognition supported by user interaction for service robots.

[44]  J. Crowley,et al.  Fast Computation of Characteristic Scale Using a Half-Octave Pyramid , 2002 .

[45]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[47]  Achilleas S Frangakis,et al.  Segmentation of two- and three-dimensional data from electron microscopy using eigenvector analysis. , 2002, Journal of structural biology.

[48]  Luhong Liang,et al.  A detector tree of boosted classifiers for real-time object detection and tracking , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[49]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[50]  Jun Wang,et al.  A Sensor fusion based object tracker for compressed video , 2003 .

[51]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[52]  B. Freisleben,et al.  Finding text in images via local thresholding , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[53]  Xing Xie,et al.  A visual attention model for adapting images on small displays , 2003, Multimedia Systems.

[54]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[55]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[56]  Mohan S. Kankanhalli,et al.  A hierarchical framework for face tracking using state vector fusion for compressed video , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[57]  C. Madhwacharyula,et al.  Information-integration approach to designing digital video albums , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[58]  Xing Xie,et al.  Salient Region Detection Using Weighted Feature Maps Based on the Human Visual Attention Model , 2004, PCM.

[59]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[60]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[61]  Harry Shum,et al.  Lazy snapping , 2004, ACM Trans. Graph..

[62]  Nuno Vasconcelos,et al.  Discriminant Saliency for Visual Recognition from Cluttered Scenes , 2004, NIPS.

[63]  J. Koenderink The structure of images , 2004, Biological Cybernetics.

[64]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[65]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[67]  Lambert Schomaker,et al.  Text detection from natural scene images: towards a system for visually impaired persons , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[68]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[69]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[70]  Greg Mori,et al.  Guiding model search using segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[71]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[72]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[73]  Simone Frintrop,et al.  Robust Object Detection at Regions of Interest with an Application in Ball Recognition , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[74]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, ACM Trans. Graph..

[75]  M. Leon,et al.  TEXT DETECTION IN IMAGES AND VIDEO SEQUENCES , 2005 .

[76]  Jianbo Shi,et al.  Spectral segmentation with multiscale graph decomposition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[77]  Jun Wang,et al.  A sensor fusion approach for tracking faces in compressed video , 2005 .

[78]  Nuno Vasconcelos,et al.  Integrated learning of saliency, complex features, and object detectors from cluttered scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[79]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[80]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[81]  H. Tran,et al.  A Novel Approach for Text Detection in Images Using Structural Features , 2005, ICAPR.

[82]  Daniel Cohen-Or,et al.  Feature-aware texturing , 2006, EGSR '06.

[83]  Byoung Chul Ko,et al.  Object-of-interest image segmentation based on human attention and semantic region clustering. , 2006, Journal of the Optical Society of America. A, Optics, image science, and vision.

[84]  Guillermo Sapiro,et al.  O(N) implementation of the fast marching algorithm , 2006, Journal of Computational Physics.

[85]  Mohan S. Kankanhalli,et al.  Modeling intent for home video repurposing , 2006, IEEE Multimedia.

[86]  King Ngi Ngan,et al.  Unsupervised extraction of visual attention objects in color images , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[87]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[88]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[89]  Bill Triggs,et al.  Boundary conditions for Young-van Vliet recursive filtering , 2006, IEEE Transactions on Signal Processing.

[90]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[91]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[92]  Richard S. Zemel,et al.  Learning and Incorporating Top-Down Cues in Image Segmentation , 2006, ECCV.

[93]  Guillermo Sapiro,et al.  A Geodesic Framework for Fast Interactive Image and Video Segmentation and Matting , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[94]  Simone Frintrop,et al.  A Real-time Visual Attention System Using Integral Images , 2007, ICVS 2007.

[95]  Benoit M. Macq,et al.  Perceptual Image Representation , 2007, EURASIP J. Image Video Process..

[96]  S. Avidan,et al.  Seam carving for content-aware image resizing , 2007, SIGGRAPH 2007.

[97]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[98]  Daniel Cohen-Or,et al.  Non-homogeneous Content-driven Video-retargeting , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[99]  Vincent Vanhoucke,et al.  Reading text in consumer digital photographs , 2007, Electronic Imaging.

[100]  Aly A. Farag,et al.  Graph Cuts Framework for Kidney Segmentation with Prior Shape Constraints , 2007, MICCAI.

[101]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[102]  Stephen J. Sangwine,et al.  Hypercomplex Fourier Transforms of Color Images , 2001, IEEE Transactions on Image Processing.

[103]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104]  Ariel Shamir,et al.  Improved seam carving for video retargeting , 2008, ACM Trans. Graph..

[105]  Umar Mohammed,et al.  Superpixel lattices , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[106]  Liming Zhang,et al.  Spatio-temporal Saliency detection using phase spectrum of quaternion fourier transform , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[107]  Anat Levin,et al.  Learning to Combine Bottom-Up and Top-Down Segmentation , 2006, International Journal of Computer Vision.

[108]  Ullrich Köthe,et al.  Segmentation of SBFSEM Volume Data of Neural Tissue by Hierarchical Classification , 2008, DAGM-Symposium.

[109]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[110]  Nuno Vasconcelos,et al.  On the plausibility of the discriminant center-surround hypothesis for visual saliency. , 2008, Journal of vision.

[111]  Liming Zhang,et al.  Biological Plausibility of Spectral Domain Approach for Spatiotemporal Visual Saliency , 2008, ICONIP.

[112]  Bärbel Mertsching,et al.  Fast and Robust Generation of Feature Maps for Region-Based Visual Attention , 2008, IEEE Transactions on Image Processing.

[113]  Clayton Brian Atkins Blocked recursive image composition , 2008, ACM Multimedia.

[114]  Stefano Soatto,et al.  Quick Shift and Kernel Methods for Mode Seeking , 2008, ECCV.

[115]  Sabine Süsstrunk,et al.  Salient Region Detection and Segmentation , 2008, ICVS.

[116]  Nuno Vasconcelos,et al.  Background subtraction in highly dynamic scenes , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[117]  Baoxin Li,et al.  A two-stage approach to saliency detection in images , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[118]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[119]  O. Sorkine-Hornung,et al.  Optimized scale-and-stretch for image resizing , 2008, SIGGRAPH Asia '08.

[120]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[121]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[122]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[123]  Pascal Fua,et al.  Image summaries using database saliency , 2009, SIGGRAPH ASIA '09.

[124]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[125]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[126]  Vincent Lepetit,et al.  Fast Ray features for learning irregular shapes , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[127]  Alexander G. Gray,et al.  Automatic joint classification and segmentation of whole cell 3D images , 2009, Pattern Recognit..

[128]  Stefano Soatto,et al.  Motion segmentation with occlusions on the superpixel graph , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[129]  Amelio Vázquez Reina,et al.  Multiphase geometric couplings for the segmentation of neural processes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[130]  P. Fua,et al.  Learning rotational features for filament detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[131]  Paul L. Rosin A simple method for detecting salient regions , 2009, Pattern Recognit..

[132]  Sabine Süsstrunk,et al.  Saliency detection for content-aware image resizing , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[133]  Sven J. Dickinson,et al.  Multiscale Symmetric Part Detection and Grouping , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[134]  Sven J. Dickinson,et al.  TurboPixels: Fast Superpixels Using Geometric Flows , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[135]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[136]  Vincent Lepetit,et al.  A Fully Automated Approach to Segmentation of Irregularly Shaped Cellular Structures in EM Images , 2010, MICCAI.

[137]  Radhakrishna Achanta,et al.  GUI-aided NIR and color image blending , 2010, Melecon 2010 - 2010 15th IEEE Mediterranean Electrotechnical Conference.

[138]  Sabine Süsstrunk,et al.  Saliency detection using maximum symmetric surround , 2010, 2010 IEEE International Conference on Image Processing.

[139]  John K. Tsotsos,et al.  Attention based on information maximization , 2010 .

[140]  S. Süsstrunk,et al.  SLIC Superpixels ? , 2010 .

[141]  Michael Unser,et al.  The Ovuscule , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.