Multimodal Salient Objects: General Building Blocks of Semantic Video Concepts

In this paper, we propose a novel video content representation framework to achieve a middle-level understanding of video contents by using multimodal salient objects. Specifically, this framework includes: (a) A semantic-sensitive video content representation and semantic video concept modeling framework by using the multimodal salient objects and Gaussian mixture models; (b) A machine learning technique to train the automatic detection functions of multimodal salient objects; (c) A novel framework to enable more effective classifier training by integrating model selection and parameter estimation seamlessly in a single algorithm. Our experiments on a certain domain of medical education videos have obtained very convincing results.

[1]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[2]  B. S. Manjunath,et al.  NeTra-V: toward an object-based video representation , 1998, IEEE Trans. Circuits Syst. Video Technol..

[3]  Shih-Fu Chang,et al.  Integration of Visual and Text-Based Approaches for the Content Labeling and Classification of Photographs , 1999, SIGIR 1999.

[4]  John R. Smith,et al.  Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[5]  Michael S. Lew,et al.  Principles of Visual Information Retrieval , 2001, Advances in Pattern Recognition.

[6]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[7]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[8]  Takeo Kanade,et al.  Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Avideh Zakhor,et al.  Applications of Video-Content Analysis and Retrieval , 2002, IEEE Multim..

[10]  Shih-Fu Chang,et al.  MediaNet: a multimedia information network for knowledge representation , 2000, SPIE Optics East.

[11]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[12]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[13]  Jianping Fan,et al.  Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing , 2004, IEEE Transactions on Image Processing.

[14]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).