Combining Context, Consistency, and Diversity Cues for Interactive Image Categorization

This paper presents a novel graph-based framework which can combine context, consistency, and diversity cues for interactive image categorization. The image representation is first formed with visual keywords by dividing images into blocks and then performing clustering on these blocks. The context across visual keywords within an image is further captured by proposing a 2-D spatial Markov chain model. To develop a graph-based approach to image categorization, we incorporate intra-image context into a new class of kernel called spatial Markov kernel which can be used to define the affinity matrix for a graph. After graph construction with this kernel, the large unlabeled data can be exploited by graph-based semi-supervised learning through label propagation with inter-image consistency. For interactive image categorization, we further combine this semi-supervised learning with active learning by defining a new diversity-based data selection criterion using spectral embedding. Experiments then demonstrate that the proposed framework can achieve promising results.

[1]  Michael R. Lyu,et al.  A semi-supervised active learning framework for image retrieval , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Xian-Sheng Hua,et al.  Video Annotation Based on Kernel Linear Neighborhood Propagation , 2008, IEEE Transactions on Multimedia.

[3]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[4]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[5]  Zhiwu Lu,et al.  Context-based multi-label image annotation , 2009, CIVR '09.

[6]  Edward Y. Chang,et al.  Multimodal concept-dependent active learning for image retrieval , 2004, MULTIMEDIA '04.

[7]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[8]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2008, IEEE Trans. Knowl. Data Eng..

[9]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[10]  Zhiwu Lu,et al.  Image categorization via robust pLSA , 2010, Pattern Recognit. Lett..

[11]  Horace Ho-Shing Ip,et al.  Semantic content analysis and annotation of histological images , 2008, Comput. Biol. Medicine.

[12]  Edward Y. Chang Organizing multimedia data socially , 2008, CIVR '08.

[13]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Zhiwu Lu,et al.  Image categorization by learning with context and consistency , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Xiaofeng Wang,et al.  A new localized superpixel Markov random field for image segmentation , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[17]  Qionghai Dai,et al.  Multilabel Neighborhood Propagation for Region-Based Image Retrieval , 2008, IEEE Transactions on Multimedia.

[18]  Jianping Fan,et al.  Mining Multilevel Image Semantics via Hierarchical Classification , 2008, IEEE Transactions on Multimedia.

[19]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[20]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[21]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Jiebo Luo,et al.  Scene Parsing Using Region-Based Generative Models , 2007, IEEE Transactions on Multimedia.

[24]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Oscar E. Agazzi,et al.  Keyword Spotting in Poorly Printed Documents using Pseudo 2-D Hidden Markov Models , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[28]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[29]  Jiawei Han,et al.  Spectral regression: a unified subspace learning framework for content-based image retrieval , 2007, ACM Multimedia.

[30]  Gerald M. Knapp,et al.  Semantic image retrieval based on probabilistic latent semantic analysis , 2006, MM '06.

[31]  Zhiwu Lu,et al.  Image categorization with spatial mismatch kernels , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[33]  Tao Wang,et al.  One step beyond histograms: Image representation using Markov stationary features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.