Segmentation-based multi-class semantic object detection

In this paper we study the problem of the detection of semantic objects from known categories in images. Unlike existing techniques which operate at the pixel or at a patch level for recognition, we propose to rely on the categorization of image segments. Recent work has highlighted that image segments provide a sound support for visual object class recognition. In this work, we use image segments as primitives to extract robust features and train detection models for a predefined set of categories. Several segmentation algorithms are benchmarked and their performances for segment recognition are compared. We then propose two methods for enhancing the segments classification, one based on the fusion of the classification results obtained with the different segmentations, the other one based on the optimization of the global labelling by correcting local ambiguities between neighbor segments. We use as a benchmark the Microsoft MSRC-21 image database and show that our method competes with the current state-of-the-art.

[1]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[2]  Alexei A. Efros,et al.  Improving Spatial Support for Objects via Multiple Segmentations , 2007, BMVC.

[3]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[6]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[7]  Andrew Zisserman,et al.  A Statistical Approach to Material Classification Using Image Patch Exemplars , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Binoy Pinto,et al.  Speeded Up Robust Features , 2011 .

[9]  Sankar K. Pal,et al.  A review on image segmentation techniques , 1993, Pattern Recognit..

[10]  Bill Triggs,et al.  Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Lin Yang,et al.  Multiple Class Segmentation Using A Unified Framework over Mean-Shift Patches , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Peter Meer,et al.  Edge Detection with Embedded Confidence , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[14]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Lei Cao,et al.  Peking University at TRECVID 2008: High Level Feature Extraction , 2008, TRECVID.

[16]  Azriel Rosenfeld,et al.  Scene Labeling by Relaxation Operations , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[18]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Chong-Wah Ngo,et al.  Columbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search , 2008, TRECVID.

[20]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[21]  Yannis Avrithis,et al.  Semantic Image Segmentation and Object Labeling , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[23]  Peter Meer,et al.  Synergism in low level vision , 2002, Object recognition supported by user interaction for service robots.

[24]  Xavier Cufí,et al.  Yet Another Survey on Image Segmentation: Region and Boundary Information Integration , 2002, ECCV.

[25]  Jenny Benois-Pineau,et al.  Retrieval of objects in video by similarity based on graph matching , 2007, Pattern Recognit. Lett..

[26]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[27]  Stéphane Ayache,et al.  Classifier Fusion for SVM-Based Multimedia Semantic Indexing , 2007, ECIR.

[28]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[29]  Selim Aksoy,et al.  Scene Classification Using Bag-of-Regions Representations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[32]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[33]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[34]  Lakshman Prasad,et al.  Vectorized Image Segmentation via Trixel Agglomeration , 2005, GbRPR.

[35]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[36]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.