Stel Component Analysis: Joint Segmentation, Modeling and Recognition of Objects Classes

Models that captures the common structure of an object class have appeared few years ago in the literature (Jojic and Caspi in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 212–219, 2004; Winn and Jojic in Proceedings of International Conference on Computer Vision (ICCV), pp. 756–763, 2005); they are often referred as “stel models.” Their main characteristic is to segment objects in clear, often semantic, parts as a consequence of the modeling constraint which forces the regions belonging to a single segment to have a tight distribution over local measurements, such as color or texture. This self-similarity within a region in a single image is typical of many meaningful image parts, even when across different images of similar objects, the corresponding parts may not have similar local measurements. Moreover, the segmentation itself is expected to be consistent within a class, although still flexible. These models have been applied mostly to segmentation scenarios.In this paper, we extent those ideas (1) proposing to capture correlations that exist in structural elements of an image class due to global effects, (2) exploiting the segmentations to capture feature co-occurrences and (3) allowing the use of multiple, eventually sparse, observation of different nature. In this way we obtain richer models more suitable to recognition tasks.We accomplish these requirements using a novel approach we dubbed stel component analysis. Experimental results show the flexibility of the model as it can deal successfully with image/video segmentation and object recognition where, in particular, it can be used as an alternative of, or in conjunction with, bag-of-features and related classifiers, where stel inference provides a meaningful spatial partition of features.

[1]  Michal Irani,et al.  Detecting and sketching the common , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Antonio Criminisi,et al.  Epitomic Location Recognition , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Alessandro Perina,et al.  Geo-located image analysis using latent representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Shimon Ullman,et al.  Learning to Segment , 2004, ECCV.

[7]  Cordelia Schmid,et al.  Spatial Weighting for Bag-of-Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[9]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, CVPR.

[14]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Dariu Gavrila,et al.  An Experimental Study on Pedestrian Classification , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Nebojsa Jojic,et al.  Object Recognition with Hierarchical Stel Models , 2010, ECCV.

[20]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[21]  Vicki Bruce,et al.  Face Recognition: From Theory to Applications , 1999 .

[22]  Wen Gao,et al.  Group-sensitive multiple kernel learning for object categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[24]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[25]  Shuicheng Yan,et al.  Spatialized epitome and its applications , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Yaron Caspi,et al.  Probabilistic Index Maps for Modeling Natural Signals , 2004, UAI.

[27]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Fei-FeiLi,et al.  One-Shot Learning of Object Categories , 2006 .

[30]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[31]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[32]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[33]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[34]  Nebojsa Jojic,et al.  Structural epitome: a way to summarize one's visual experience , 2010, NIPS.

[35]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[39]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  D. B. Graham,et al.  Characterising Virtual Eigensignatures for General Purpose Face Recognition , 1998 .

[41]  SchieleBernt,et al.  Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2007 .

[42]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[43]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[44]  Nebojsa Jojic,et al.  Image analysis by counting on a grid , 2011, CVPR 2011.

[45]  Nebojsa Jojic,et al.  Capturing image structure with probabilistic index maps , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[46]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[47]  Brendan J. Frey,et al.  Epitomic analysis of appearance and shape , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[48]  Thomas Deselaers,et al.  Global and efficient self-similarity for object classification and detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[50]  Nebojsa Jojic,et al.  Capturing long-range correlations with patch models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[52]  Nebojsa Jojic,et al.  Escaping local minima through hierarchical model selection: Automatic object discovery, segmentation, and tracking in video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[53]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[54]  Irfan A. Essa,et al.  Tree-based Classifiers for Bilayer Video Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[56]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[57]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[58]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Alessandro Perina,et al.  2LDA: Segmentation for Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[60]  Brendan J. Frey,et al.  Stel component analysis: Modeling spatial correlations in image class structure , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Yi Yang,et al.  Layered object detection for multi-class segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[62]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[63]  Lixin Fan,et al.  Categorizing Nine Visual Classes using Local Appearance Descriptors , 2004 .

[64]  Nanning Zheng,et al.  Learning to Detect A Salient Object , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[66]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[67]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[69]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[70]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[71]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Alessandro Perina,et al.  Learning natural scene categories by selective multi-scale feature extraction , 2010, Image Vis. Comput..

[73]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[74]  D. B. Gerham Characterizing virtual eigensignatures for general purpose face recognition , 1998 .

[75]  Nebojsa Jojic,et al.  Capturing video structure with mixture of probabilistic index maps , 2008, ECCV 2008.

[76]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..