A Semi-supervised Learning Approach to Object Recognition with Spatial Integration of Local Features and Segmentation Cues

This chapter presents a principled way of formulating models for automatic local feature selection in object class recognition when there is little supervised data. Moreover, it discusses how one could formulate sensible spatial image context models using a conditional random field for integrating local features and segmentation cues (superpixels). By adopting sparse kernel methods and Bayesian model selection and data association, the proposed model identifies the most relevant sets of local features for recognizing object classes, achieves performance comparable to the fully supervised setting, and consistently outperforms existing methods for image classification.

[1]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  David J. Freedman,et al.  Visual Categorization: How the Monkey Brain Does It , 2002, Biologically Motivated Computer Vision.

[3]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[4]  Jiří Matas,et al.  Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[5]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[6]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[7]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[9]  Thomas Hofmann,et al.  Multiple instance learning with generalized support vector machines , 2002, AAAI/IAAI.

[10]  Seong-Whan Lee,et al.  Biologically Motivated Computer Vision , 2002, Lecture Notes in Computer Science.

[11]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[14]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[15]  Kotagiri Ramamohanarao,et al.  Sparse Bayesian Learning for Regression and Classification using Markov Chain Monte Carlo , 2002, ICML.

[16]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[17]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[18]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[19]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[20]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[21]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[22]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[23]  Peter Auer,et al.  Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[24]  D. McFadden A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration , 1989 .

[25]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[26]  Nando de Freitas,et al.  Learning about Individuals from Group Statistics , 2005, UAI.

[27]  Yee Whye Teh,et al.  Faces and names in the news , 2004, CVPR 2004.

[28]  Nando de Freitas,et al.  From Fields to Trees , 2004, UAI.

[29]  Nando de Freitas,et al.  A Constrained Semi-supervised Learning Approach to Data Association , 2004, ECCV.

[30]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[31]  Cordelia Schmid,et al.  Selection of scale-invariant parts for object class recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Nando de Freitas,et al.  Bayesian Feature Weighting for Unsupervised Learning, with Application to Object Recognition , 2003, AISTATS.

[34]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[36]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .