Autonomous learning of visual concept models

As the amount of video data increases, organizing and retrieving video data based on their semantics is becoming increasingly important. Traditionally, supervised learning is used to build models for detecting semantic concepts. However, in order to obtain a substantial amount of training data, extensive labeling work is needed with the supervised learning schemes. In this paper, we propose a novel autonomous learning framework in which imperfect labelling automatically extracted from cross-modality information is used for training. This completely avoids the manual labeling process. In our proposed framework, imperfect labels without user involvement are first obtained from cross-modality information. Then, based on our proposed new schemes, "generalized multiple-instance learning" and "uncertain labeling density", the system conjectures relevance scores of visual concepts. From these scores, support vector regression is used to build generic visual models. In preliminary experiments, we use the proposed system to learn 20 visual concepts in 6 hours of video. Compare with two concept models that were trained by two supervised algorithms, this novel autonomous learning framework achieves better system average precisions. Other concept models also show promising results.

[1]  Ching-Yung Lin,et al.  Semantic Routing and Filtering for Large-Scale Video Streams Monitoring , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[2]  John R. Smith,et al.  User-trainable video annotation using multimodal cues , 2003, SIGIR '03.

[3]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[4]  Sally A. Goldman,et al.  Multiple-Instance Learning of Real-Valued Data , 2001, J. Mach. Learn. Res..

[5]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[6]  Ching-Yung Lin,et al.  Cross-Modality Automatic Face Model Training from Large Video Databases , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[8]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[9]  Stefanie Eberhardt Support Vector Machines For Pattern Recognition , 2006 .

[10]  John R. Smith,et al.  VideoAL: a novel end-to-end MPEG-7 video automatic labeling system , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[11]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[12]  Ching-Yung Lin,et al.  Autonomous visual model building based on image crawling through internet search engines , 2004, MIR '04.

[13]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[14]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.