Exploring multi-modality structure for cross domain adaptation in video concept annotation

Domain adaptive video concept detection and annotation has recently received significant attention, but in existing video adaptation processes, all the features are treated as one modality, while multi-modalities, the unique and important property of video data, is typically ignored. To fill this blank, we propose a novel approach, named multi-modality transfer based on multi-graph optimization (MMT-MGO) in this paper, which leverages multi-modality knowledge generalized by auxiliary classifiers in the source domains to assist multi-graph optimization (a graph-based semi-supervised learning method) in the target domain for video concept annotation. To our best knowledge, it is the first time to introduce multi-modality transfer into the field of domain adaptive video concept detection and annotation. Moreover, we propose an efficient incremental extension scheme to sequentially estimate a small batch of new emerging data without modifying the structure of multi-graph scheme. The proposed scheme can achieve a comparable accuracy with that of brand-new round optimization which combines these new data with the data corpus for the nearest round optimization, while the time for estimation has been reduced greatly. Extensive experiments over TRECVID2005-2007 data sets demonstrate the effectiveness of both the multi-modality transfer scheme and the incremental extension scheme.

[1]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[2]  Chong-Wah Ngo,et al.  Semantic context transfer across heterogeneous sources for domain adaptive video search , 2009, ACM Multimedia.

[3]  Ming Li,et al.  Online Manifold Regularization: A New Learning Setting and Empirical Study , 2008, ECML/PKDD.

[4]  Xiaogang Wang,et al.  Boosted multi-task learning for face verification with applications to web image and video search , 2009, CVPR.

[5]  Tat-Seng Chua,et al.  Exploring large scale data for multimedia QA: an initial study , 2010, CIVR '10.

[6]  Ivor W. Tsang,et al.  Using large-scale web data to facilitate textual query based retrieval of consumer photos , 2009, MM '09.

[7]  Ivor W. Tsang,et al.  Domain Transfer SVM for video concept detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  David B. Dunson,et al.  Hierarchical kernel stick-breaking process for multi-task image analysis , 2008, ICML '08.

[9]  Jun Yang,et al.  A framework for classifier adaptation and its applications in concept detection , 2008, MIR '08.

[10]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Clustering via the SocialWeb , 2009, ACL.

[11]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[12]  Sheng Tang,et al.  TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.

[13]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[14]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[15]  Jingrui He,et al.  Graph-Based Semi-Supervised Learning as a Generative Model , 2007, IJCAI.

[16]  Hung-Khoon Tan,et al.  Event driven summarization for web videos , 2009, WSM '09.

[17]  Rong Yan,et al.  Recent developments in content-based and concept-based image/video retrieval , 2008, ACM Multimedia.

[18]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[19]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[20]  Hui Xiong,et al.  Transfer learning from multiple source domains via consensus regularization , 2008, CIKM '08.

[21]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[22]  Ajay Divakaran Multimedia Content Analysis: Theory and Applications , 2008 .

[23]  Shih-Fu Chang,et al.  Cross-domain learning methods for high-level visual concept classification , 2008, 2008 15th IEEE International Conference on Image Processing.

[24]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[25]  Ivor W. Tsang,et al.  Domain adaptation from multiple sources via auxiliary classifiers , 2009, ICML '09.

[26]  Wei-Ying Ma,et al.  Graph based multi-modality learning , 2005, ACM Multimedia.

[27]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[28]  Qi Tian,et al.  Visual Synset: Towards a higher-level visual representation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Trevor Darrell,et al.  Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Jian Pei,et al.  Clustering by Pattern Similarity , 2008, Journal of Computer Science and Technology.

[31]  Juan Cao,et al.  Large scale incremental web video categorization , 2009, WSMC '09.

[32]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[33]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[34]  Meng Wang,et al.  Optimizing multi-graph learning: towards a unified video annotation scheme , 2007, ACM Multimedia.

[35]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.