Interactive Video Indexing With Statistical Active Learning

Video indexing, also called video concept detection, has attracted increasing attentions from both academia and industry. To reduce human labeling cost, active learning has been introduced to video indexing recently. In this paper, we propose a novel active learning approach based on the optimum experimental design criteria in statistics. Different from existing optimum experimental design, our approach simultaneously exploits sample's local structure, and sample relevance, density, and diversity information, as well as makes use of labeled and unlabeled data. Specifically, we develop a local learning model to exploit the local structure of each sample. Our assumption is that for each sample, its label can be well estimated based on its neighbors. By globally aligning the local models from all the samples, we obtain a local learning regularizer, based on which a local learning regularized least square model is proposed. Finally, a unified sample selection approach is developed for interactive video indexing, which takes into account the sample relevance, density and diversity information, and sample efficacy in minimizing the parameter variance of the proposed local learning regularized least square model. We compare the performance between our approach and the state-of-the-art approaches on the TREC video retrieval evaluation (TRECVID) benchmark. We report superior performance from the proposed approach.

[1]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[2]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[3]  John R. Smith,et al.  Active learning for simultaneous annotation of multiple binary semantic concepts [video content analysis] , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[4]  Bernhard Schölkopf,et al.  Transductive Classification via Local Learning Regularization , 2007, AISTATS.

[5]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[7]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[9]  Xuelong Li,et al.  Modality Mixture Projections for Semantic Video Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Paul N. Bennett,et al.  Dual Strategy Active Learning , 2007, ECML.

[11]  Tao Mei,et al.  Graph-based semi-supervised learning with multiple labels , 2009, J. Vis. Commun. Image Represent..

[12]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[13]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[14]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[15]  Edward Y. Chang,et al.  Active Learning for Interactive Multimedia Retrieval , 2008, Proceedings of the IEEE.

[16]  Marcel Worring,et al.  Interactive Search by Direct Manipulation of Dissimilarity Space , 2007, IEEE Transactions on Multimedia.

[17]  Meng Wang,et al.  MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search , 2007, TRECVID.

[18]  Meng Wang,et al.  Multi-Concept Multi-Modality Active Learning for Interactive Video Annotation , 2007, International Conference on Semantic Computing (ICSC 2007).

[19]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[20]  Xian-Sheng Hua,et al.  Transductive video annotation via local learnable kernel classifier , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[21]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[23]  Mei-Ling Shyu,et al.  ASIC: Supervised Multi-class Classification using Adaptive Selection of Information Components , 2007 .

[24]  Chun Chen,et al.  Convex experimental design using manifold structure for image retrieval , 2009, MM '09.

[25]  Xiaofei He,et al.  Laplacian Regularized D-Optimal Design for Active Learning and Its Application to Image Retrieval , 2010, IEEE Transactions on Image Processing.

[26]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[27]  Meng Wang,et al.  Locally regressive G-optimal design for image retrieval , 2011, ICMR.

[28]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[29]  Thomas S. Huang,et al.  Leveraging Active Learning for Relevance Feedback Using an Information Theoretic Diversity Measure , 2006, CIVR.

[30]  John R. Smith,et al.  MPEG-7 video automatic labeling system , 2003, MULTIMEDIA '03.

[31]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[32]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[33]  Kun Zhou,et al.  Laplacian optimal design for image retrieval , 2007, SIGIR.

[34]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[35]  Howard D. Wactlar,et al.  Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers , 2005, MULTIMEDIA '05.

[36]  Xiaofei He,et al.  A unified active and semi-supervised learning framework for image compression , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[38]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[39]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[40]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[41]  Alexander G. Hauptmann,et al.  Active Learning in Multiple Modalities for Semantic Feature Extraction from Video , 2005 .

[42]  Chun Chen,et al.  G-Optimal Design with Laplacian Regularization , 2010, AAAI.

[43]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[44]  Stéphane Ayache,et al.  Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..