Semi-supervised cross feature learning for semantic concept detection in videos

For large scale automatic semantic video characterization, it is necessary to learn and model a large number of semantic concepts. But a major obstacle to this is the insufficiency of labeled training samples. Multi-view semi-supervised learning algorithms such as co-training may help by incorporating a large amount of unlabeled data. However, one of their assumptions requiring that each view be sufficient for learning is usually violated in semantic concept detection. In this paper, we propose a novel multi-view semi-supervised learning algorithm called semi-supervised cross feature learning (SCFL). The proposed algorithm has two advantages over co-training. First, SCFL can theoretically guarantee its performance not being significantly degraded even when the assumption of view sufficiency fails. Also, SCFL can also handle additional views of unlabeled data even when these views are absent from the training data. As demonstrated in the TRECVID '03 semantic concept extraction task, the proposed SCFL algorithm not only significantly outperforms the conventional co-training algorithms, but also comes close to achieving the performance when the unlabeled set were to be manually annotated and used for training along with the labeled data set.

[1]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Paul A. Viola,et al.  Unsupervised improvement of visual detectors using cotraining , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Claire Cardie,et al.  Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[5]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[6]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[7]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[8]  Milind R. Naphade,et al.  Probabilistic Semantic Video Indexing , 2000, NIPS.

[9]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[12]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[13]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[14]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[15]  John R. Smith,et al.  VideoAnnEx: IBM MPEG-7 Annotation Tool for Multimedia Indexing and Concept Learning , 2003 .