Category Level Object Segmentation by Combining Bag-of-Words Models with Dirichlet Processes and Random Fields

This paper addresses the problem of accurately segmenting instances of object classes in images without any human interaction. Our model combines a bag-of-words recognition component with spatial regularization based on a random field and a Dirichlet process mixture. Bag-of-words models successfully predict the presence of an object within an image; however, they can not accurately locate object boundaries. Random Fields take into account the spatial layout of images and provide local spatial regularization. Yet, as they use local coupling between image labels, they fail to capture larger scale structures needed for object recognition. These components are combined with a Dirichlet process mixture. It models images as a composition of regions, each representing a single object instance. Gibbs sampling is used for parameter estimations and object segmentation.Our model successfully segments object category instances, despite cluttered backgrounds and large variations in appearance and viewpoints. The strengths and limitations of our model are shown through extensive experimental evaluations. First, we evaluate the result of two methods to build visual vocabularies. Second, we show how to combine strong labeling (segmented images) with weak labeling (images annotated with bounding boxes), in order to limit the labeling effort needed to learn the model. Third, we study the effect of different initializations. We present results on four image databases, including the challenging PASCAL VOC 2007 data set on which we obtain state-of-the art results.

[1]  Bastian Leibe,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[2]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[3]  Frédéric Jurie,et al.  Latent mixture vocabularies for object categorization and segmentation , 2006, Image Vis. Comput..

[4]  Anat Levin,et al.  Learning to Combine Bottom-Up and Top-Down Segmentation , 2006, ECCV.

[5]  Cordelia Schmid,et al.  Object Recognition by Integrating Multiple Image Segmentations , 2008, ECCV.

[6]  Gabriela Csurka,et al.  A Simple High Performance Approach to Semantic Segmentation , 2008, BMVC.

[7]  IEEE conference on computer vision and pattern recognition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[8]  Joachim M. Buhmann,et al.  Smooth Image Segmentation by Nonparametric Bayesian Inference , 2006, ECCV.

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  Andrew Zisserman,et al.  OBJ CUT , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[12]  Antonio Torralba,et al.  Describing Visual Scenes Using Transformed Objects and Parts , 2008, International Journal of Computer Vision.

[13]  Christopher K. I. Williams,et al.  Image Modeling with Position-Encoding Dynamic Trees , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Jitendra Malik,et al.  Shape Guided Object Segmentation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[18]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[19]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Li Fei-Fei,et al.  Spatially coherent latent topic model for concurrent object segmentation and classification , 2007 .

[21]  Martial Hebert,et al.  Discriminative Random Fields , 2006, International Journal of Computer Vision.

[22]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[24]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[26]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[27]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[28]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[29]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jamie Shotton,et al.  The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[32]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[33]  Frédéric Jurie,et al.  Combining appearance models and Markov Random Fields for category level object segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Jian Sun,et al.  Lazy snapping , 2004, SIGGRAPH 2004.

[35]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[36]  Bill Triggs,et al.  Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Bill Triggs,et al.  Scene Segmentation with CRFs Learned from Partially Labeled Images , 2007, NIPS.

[38]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[39]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[41]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[42]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..