Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming

For online medical education purposes, we have developed a novel scheme to incorporate the results of semantic video classification to select the most representative video shots for generating concept-oriented summarization and skimming of surgery education videos. First, salient objects are used as the video patterns for feature extraction to achieve a good representation of the intermediate video semantics. The salient objects are defined as the salient video compounds that can be used to characterize the most significant perceptual properties of the corresponding real world physical objects in a video, and thus the appearances of such salient objects can be used to predict the appearances of the relevant semantic video concepts in a specific video domain. Second, a novel multi-modal boosting algorithm is developed to achieve more reliable video classifier training by incorporating feature hierarchy and boosting to dramatically reduce both the training cost and the size of training samples, thus it can significantly speed up SVM (support vector machine) classifier training. In addition, the unlabeled samples are integrated to reduce the human efforts on labeling large amount of training samples. Finally, the results of semantic video classification are incorporated to enable concept-oriented video summarization and skimming. Experimental results in a specific domain of surgery education videos are provided.

[1]  Nicu Sebe,et al.  Video retrieval and summarization , 2003, Comput. Vis. Image Underst..

[2]  Shih-Fu Chang,et al.  Video skims: taxonomies and an optimal generation framework , 2002, Proceedings. International Conference on Image Processing.

[3]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[4]  Levent Onural,et al.  Image sequence analysis for emerging interactive multimedia services-the European COST 211 framework , 1998, IEEE Trans. Circuits Syst. Video Technol..

[5]  J. Langford,et al.  FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness , 2000, ICML.

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  Chabane Djeraba When image indexing meets knowledge discovery , 2000, MDM/KDD.

[8]  Gang Wei,et al.  Video classification based on HMM using text and faces , 2000, 2000 10th European Signal Processing Conference.

[9]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[10]  Ying Li,et al.  Atomic topical segments detection for instructional videos , 2006, MM '06.

[11]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[12]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[13]  Yanjun Qi,et al.  Supervised classification for video shot segmentation , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[14]  Rainer Lienhart,et al.  Scene Determination Based on Video and Audio Features , 2004, Multimedia Tools and Applications.

[15]  Shih-Fu Chang,et al.  Structure analysis of soccer video with domain knowledge and hidden Markov models , 2004, Pattern Recognit. Lett..

[16]  Jianping Fan,et al.  Automatic image segmentation by integrating color-edge extraction and seeded region growing , 2001, IEEE Trans. Image Process..

[17]  Anoop Gupta,et al.  Auto-summarization of audio-video presentations , 1999, MULTIMEDIA '99.

[18]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[19]  Alexander C. Loui,et al.  Finding structure in home videos by probabilistic hierarchical clustering , 2003, IEEE Trans. Circuits Syst. Video Technol..

[20]  Remi Depommier,et al.  Content-based browsing of video sequences , 1994, MULTIMEDIA '94.

[21]  M. Ibrahim Sezan,et al.  A semantic event-detection approach and its application to detecting hunts in wildlife vide , 2000, IEEE Trans. Circuits Syst. Video Technol..

[22]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres (demonstration) , 1995, MULTIMEDIA '95.

[23]  John R. Kender,et al.  Lecture videos for e-learning: current research and challenges , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[24]  Nicu Sebe,et al.  Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Hayit Greenspan,et al.  Probabilistic space-time video modeling via piecewise GMM , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jianping Fan,et al.  Multimodal Salient Objects: General Building Blocks of Semantic Video Concepts , 2004, CIVR.

[27]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[28]  C.-C. Jay Kuo,et al.  Rule-based video classification system for basketball video indexing , 2000, MULTIMEDIA '00.

[29]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[30]  M. Smith,et al.  Video Skimming for Quick Browsing based on Audio and Image Characterization , 1995 .

[31]  Shih-Fu Chang Optimal Video Adaptation and Skimming Using a Utility-Based Framework , 2002 .

[32]  A. Murat Tekalp,et al.  Automatic soccer video analysis and summarization , 2003, IEEE Trans. Image Process..

[33]  Shih-Fu Chang,et al.  A utility framework for the automatic generation of audio-visual skims , 2002, MULTIMEDIA '02.

[34]  Fernando Pereira,et al.  Classification of video segmentation application scenarios , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Svetha Venkatesh,et al.  Toward automatic extraction of expressive elements from motion pictures: tempo , 2002, IEEE Trans. Multim..

[36]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[37]  Jenq-Neng Hwang,et al.  A real-time interactive virtual classroom multimedia distance learning system , 2001, IEEE Trans. Multim..

[38]  Michael S. Lew,et al.  Principles of Visual Information Retrieval , 2001, Advances in Pattern Recognition.

[39]  John R. Smith,et al.  Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[40]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[41]  Shih-Fu Chang,et al.  Computable scenes and structures in films , 2002, IEEE Trans. Multim..

[42]  Svetha Venkatesh,et al.  Towards automatic extraction of expressive elements from motion pictures: tempo , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[43]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.

[44]  Jianping Fan,et al.  Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing , 2004, IEEE Transactions on Image Processing.

[45]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[46]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[47]  Chabane Djeraba,et al.  Multimedia Mining: A Highway to Intelligent Multimedia Documents , 2002, Multimedia Systems and Applications.

[48]  Tsuhan Chen,et al.  Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998, J. VLSI Signal Process..

[49]  Wolfgang Effelsberg,et al.  Scene Determination Based on Video and Audio Features , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[50]  Shih-Fu Chang,et al.  Learning Structured Visual Detectors from User Input at Multiple Levels , 2001, Int. J. Image Graph..

[51]  Jay F. Nunamaker,et al.  A natural language approach to content-based video indexing and retrieval for interactive e-learning , 2004, IEEE Transactions on Multimedia.

[52]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[53]  Shahram Ebadollahi,et al.  Echocardiogram videos: summarization, temporal segmentation and browsing , 2002, Proceedings. International Conference on Image Processing.