Semantic video classification with insufficient labeled samples

To support more effective video retrieval at semantic level, we introduce a novel framework to achieve semantic video classification. This novel framework includes: (a) A semantic-senstive video content representation framework via principal video shots to enhance the quality of features (i.e., the ability of the selected low-level multimodal perceptual features to discriminate among various semantic video concepts); (b) A semantic video concept interpretation framework via flexible mixture model to bridge the semantic gap between the semantic video concepts and the low-level multimodal perceptual features; (c) A novel concept learning technique to integrate unlabeled samples with labeled samples for more accurate classifier training. Experimental results on semantic medical video classification are also presented to evaluate the performance of the proposed framework.

[1]  Jianping Fan,et al.  Principal Video Shot: Linking Low-Level Perceptional Features to Semantic Video Events , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  Thomas S. Huang,et al.  Image classification using a set of labeled and unlabeled images , 2000, SPIE Optics East.

[3]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[4]  Shih-Fu Chang,et al.  Computable scenes and structures in films , 2002, IEEE Trans. Multim..

[5]  Rainer Lienhart,et al.  Automatic text recognition for video indexing , 1997, MULTIMEDIA '96.

[6]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[7]  Svetha Venkatesh,et al.  Towards automatic extraction of expressive elements from motion pictures: tempo , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[8]  Nuno Vasconcelos,et al.  A Bayesian framework for semantic content characterization , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[9]  Nicu Sebe,et al.  Learning Bayesian network classifiers for facial expression recognition both labeled and unlabeled data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[11]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[12]  John R. Smith,et al.  Integrating Features, Models, and Semantics for TREC Video Retrieval , 2001, TREC.

[13]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[14]  Jianping Fan,et al.  ClassView: hierarchical video shot classification, indexing, and accessing , 2004, IEEE Transactions on Multimedia.

[15]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[16]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[17]  John R. Smith,et al.  Image Classification and Querying Using Composite Region Templates , 1999, Comput. Vis. Image Underst..

[18]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[19]  Jianping Fan,et al.  Automatic model-based semantic object extraction algorithm , 2001, IEEE Trans. Circuits Syst. Video Technol..