Region Classification with Markov Field Aspect Models

Considerable advances have been made in learning to recognize and localize visual object classes. Simple bag-of-feature approaches label each pixel or patch independently. More advanced models attempt to improve the coherence of the labellings by introducing some form of inter-patch coupling: traditional spatial models such as MRF's provide crisper local labellings by exploiting neighbourhood-level couplings, while aspect models such as PLSA and LDA use global relevance estimates (global mixing proportions for the classes appearing in the image) to shape the local choices. We point out that the two approaches are complementary, combining them to produce aspect-based spatial field models that outperform both approaches. We study two spatial models: one based on averaging over forests of minimal spanning trees linking neighboring image regions, the other on an efficient chain-based Expectation Propagation method for regular 8-neighbor Markov random fields. The models can be trained using either patch-level labels or image-level keywords. As input features they use factored observation models combining texture, color and position cues. Experimental results on the MSR Cambridge data sets show that combining spatial and aspect models significantly improves the region-level classification accuracy. In fact our models trained with image-level labels outperform PLSA trained with pixel-level ones.

[1]  S. C. Pearce,et al.  Field experimentation with fruit trees and other perennial plants , 1953 .

[2]  Emanuele Trucco,et al.  Robust motion and correspondence of noisy 3-D point sets with missing data , 1999, Pattern Recognit. Lett..

[3]  Joachim M. Buhmann,et al.  Histogram clustering for unsupervised segmentation and image retrieval , 1999, Pattern Recognit. Lett..

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Cordelia Schmid,et al.  Affine-invariant local descriptors and neighborhood statistics for texture recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[7]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  R. Zemel,et al.  Multiscale conditional random fields for image labeling , 2004, CVPR 2004.

[10]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  Nando de Freitas,et al.  From Fields to Trees , 2004, UAI.

[13]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[14]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[15]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Yee Whye Teh,et al.  Structured Region Graphs: Morphing EP into GBP , 2005, UAI.

[17]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[19]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[21]  Jean-Marc Odobez,et al.  Integrating Co-Occurrence and Spatial Contexts on PatchBased Scene Segmentation , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[22]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[24]  Frédéric Jurie,et al.  Latent mixture vocabularies for object categorization and segmentation , 2006, Image Vis. Comput..

[25]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[26]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[27]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.