Active Context-Based Concept Fusionwith Partial User Labels

In this paper we propose a new framework, called active context-based concept fusion, for effectively improving the accuracy of semantic concept detection in images and videos. Our approach solicits user annotations for a small number of concepts, which are used to refine the detection of the rest of concepts. In contrast with conventional methods, our approach is active, by using information theoretic criteria to automatically determine the optimal concepts for user annotation. Our experiments over TRECVID 2005 development set (about 80 hours) show significant performance gains. In addition, we have developed an effective method to predict concepts that may benefit from context-based fusion.

[1]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[2]  Nuno Vasconcelos Feature selection by maximum marginal diversity: optimality and implications for visual recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[3]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[4]  Martin E. Hellman,et al.  Probability of error, equivocation, and the Chernoff bound , 1970, IEEE Trans. Inf. Theory.

[5]  Shih-Fu Chang,et al.  Experiments in constructing belief networks for image classification systems , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[6]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..