Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems

This paper proposes a novel scheme for bridging the gap between low level media features and high level semantics using a probabilistic framework. We propose a framework, in which scenes can be indexed at a semantic level. The fundamental components of the framework are sites, objects and events. Detection of presence of an instance of one of these influences the probability of the presence of instances within other classes. Detection of instances is done using probabilistic multimedia objects: multijects. Indexing using multijects can handle queries posed at semantic level. multijects are built in a Markovian framework. Two ways of building the multijects from low level features fusing features from multiple modalities are presented. A probabilistic framework is also envisioned to encode the higher level relationship between multijects, which enhances or reduces the probabilities of concurrent existence of various multijects. An actual implementation is presented by developing multijects representing the higher level concept of "explosion" and "waterfall". The models are evaluated by using the multijects to detect explosions and waterfalls in movies. Results reveal, that the multijects detect the aforementioned events with greater accuracy and are able to segment the video into scenes which have explosions and waterfalls.

[1]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[2]  Brendan J. Frey,et al.  Event-coupled hidden Markov models , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[3]  Yücel Altunbasak,et al.  Content-based video retrieval and compression: a unified solution , 1997, Proceedings of International Conference on Image Processing.

[4]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[5]  Jeho Nam,et al.  Speaker identification and video analysis for hierarchical video shot classification , 1997, Proceedings of International Conference on Image Processing.

[6]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[7]  B. S. Manjunath,et al.  Content-based search of video using color, texture, and motion , 1997, Proceedings of International Conference on Image Processing.

[8]  Sanjeev R. Kulkarni,et al.  Automated analysis and annotation of basketball video , 1997, Electronic Imaging.

[9]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[12]  Shih-Fu Chang,et al.  Spatio-temporal video search using the object based video representation , 1997, Proceedings of International Conference on Image Processing.

[13]  A. Murat Tekalp,et al.  A high-performance shot boundary detection algorithm using multiple cues , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).