Unsupervised Learning of Categorical Segments in Image Collections

Which one comes first: segmentation or recognition? We propose a unified framework for carrying out the two simultaneously and without supervision. The framework combines a flexible probabilistic model, for representing the shape and appearance of each segment, with the popular “bag of visual words” model for recognition. If applied to a collection of images, our framework can simultaneously discover the segments of each image and the correspondence between such segments, without supervision. Such recurring segments may be thought of as the “parts” of corresponding objects that appear multiple times in the image collection. Thus, the model may be used for learning new categories, detecting/classifying objects, and segmenting images, without using expensive human annotation.

[1]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[2]  Jianbo Shi,et al.  Learning Segmentation by Random Walks , 2000, NIPS.

[3]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[4]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[6]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[7]  Pietro Perona,et al.  Recognition of planar object classes , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Joachim M. Buhmann,et al.  Nonparametric Bayesian Image Segmentation , 2008, International Journal of Computer Vision.

[9]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[10]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Chris H. Q. Ding,et al.  A Probabilistic Approach for Optimizing Spectral Clustering , 2005, NIPS.

[12]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[13]  Shimon Ullman,et al.  Object recognition with informative features and linear classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[15]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  Bill Triggs,et al.  Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[20]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[22]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[23]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Cordelia Schmid,et al.  A maximum entropy framework for part-based texture and object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[25]  Cordelia Schmid,et al.  Selection of scale-invariant parts for object class recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[27]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[28]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[29]  Hans-Peter Seidel,et al.  Nonparametric Density Estimation with Adaptive, Anisotropic Kernels for Human Motion Tracking , 2007, Workshop on Human Motion.

[30]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[31]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[32]  W. Eric L. Grimson,et al.  Spatial Latent Dirichlet Allocation , 2007, NIPS.

[33]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[34]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[36]  Michael I. Jordan,et al.  Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes , 2008, NIPS.

[37]  David Marr,et al.  VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[38]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[39]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[40]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[41]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[42]  Pietro Perona,et al.  Unsupervised Learning of Categorical Segments in Image Collections , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Narendra Ahuja,et al.  Learning the Taxonomy and Models of Categories Present in Arbitrary Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[44]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[45]  Amnon Shashua,et al.  A unifying approach to hard and probabilistic clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[46]  T. Poggio,et al.  BOOK REVIEW David Marr’s Vision: floreat computational neuroscience VISION: A COMPUTATIONAL INVESTIGATION INTO THE HUMAN REPRESENTATION AND PROCESSING OF VISUAL INFORMATION , 2009 .

[47]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48]  David Mumford,et al.  Occlusion Models for Natural Images: A Statistical Study of a Scale-Invariant Dead Leaves Model , 2004, International Journal of Computer Vision.

[49]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[50]  Pietro Perona,et al.  Non-Parametric Probabilistic Image Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[51]  R. Fildes Journal of the American Statistical Association : William S. Cleveland, Marylyn E. McGill and Robert McGill, The shape parameter for a two variable graph 83 (1988) 289-300 , 1989 .

[52]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[53]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[54]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[55]  L. Wasserman,et al.  RATES OF CONVERGENCE FOR THE GAUSSIAN MIXTURE SIEVE , 2000 .

[56]  Shimon Ullman,et al.  Class-Specific, Top-Down Segmentation , 2002, ECCV.

[57]  W. J. Langford Statistical Methods , 1959, Nature.

[58]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[59]  S. Shott,et al.  Nonparametric Statistics , 2018, The Encyclopedia of Archaeological Sciences.

[60]  Jianbo Shi,et al.  Spectral segmentation with multiscale graph decomposition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[61]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[62]  Li Fei-Fei,et al.  Spatially coherent latent topic model for concurrent object segmentation and classification , 2007 .

[63]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[64]  Narendra Ahuja,et al.  Extracting Subimages of an Unknown Category from a Set of Images , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[65]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  Jianbo Shi,et al.  Segmentation given partial grouping constraints , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.