Semantic video classification by integrating flexible mixture model with adaptive EM algorithm

Digital video now plays an important role in medical education and healthcare, but our ability to automatic video indexing at semantic level is currently primitive. In this paper, we propose a novel framework to enable more effective semantic video classification and indexing in a specific surgery education video domain. Specifically, this framework includes: (a) A novel semantic-sensitive video content characterization and representation framework by using principal video shots and their perceptual multimodal features. (b) A novel semantic medical concept interpretation technique by using flexible mixture model. (c) A semantic video classifier by using an adaptive Expectation-Maximization (EM) algorithm for automatic parameter estimation and model selection (i.e., selecting the optimal number of mixture Gaussian components). Since more effective video content characterization framework has been integrated with an adaptive EM algorithm for video classification, our semantic video classifier has improved the classification accuracy significantly. For skin classification, its accuracy is close to 95.5%. For semantic surgical video classification, it achieves overall ≈ 84.6% accuracy.

[1]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[2]  Jianping Fan,et al.  ClassView: hierarchical video shot classification, indexing, and accessing , 2004, IEEE Transactions on Multimedia.

[3]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[4]  Michael S. Lew,et al.  Principles of Visual Information Retrieval , 2001, Advances in Pattern Recognition.

[5]  Jianping Fan,et al.  Principal Video Shot: Linking Low-Level Perceptional Features to Semantic Video Events , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[6]  John R. Smith,et al.  Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[7]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[8]  Yihong Gong,et al.  Automatic parsing and indexing of news video , 1995, Multimedia Systems.

[9]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[10]  John R. Smith,et al.  Image Classification and Querying Using Composite Region Templates , 1999, Comput. Vis. Image Underst..

[11]  Shih-Fu Chang,et al.  Determining computable scenes in films and their structures using audio-visual memory models , 2000, ACM Multimedia.

[12]  Jianping Fan,et al.  Automatic image segmentation by integrating color-edge extraction and seeded region growing , 2001, IEEE Trans. Image Process..

[13]  Avideh Zakhor,et al.  Applications of Video-Content Analysis and Retrieval , 2002, IEEE Multim..

[14]  Svetha Venkatesh,et al.  Towards automatic extraction of expressive elements from motion pictures: tempo , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[15]  Svetha Venkatesh,et al.  Toward automatic extraction of expressive elements from motion pictures: tempo , 2002, IEEE Trans. Multim..

[16]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).