Temporal Bayesian Network based contextual framework for structured information mining

Specific domains in video data contain rich temporal structures that help in classification process. In this paper, we exploit the temporal structure to characterize video sequence data into different classes. We propose the following perceptual features: Time-to-Collision, shot length and transition, and temporal motion activity. Using these perceptual features, several video classes are characterized leading to formation of high-level sequence classification. Resulting high-level queries are more easily mapped onto the perceptual features enabling better accessibility of content-based retrieval systems. Temporal fusion of the perceptual features forms higher-level structures, which can be effectively tackled using the Dynamic Bayesian Networks. The Networks allow the power of statistical inference and learning to be combined with the temporal and contextual knowledge of the problem. The modeling and experimental results are presented for a number of key applications, like sequence identification, extracting highlights for sports, and parsing a news program.

[1]  Loong-Fah Cheong,et al.  Robust identification of gradual shot-transition types , 2002, Proceedings. International Conference on Image Processing.

[2]  V. Sobchack,et al.  Toward inhabited space: The semiotic structure of camera movement in the cinema , 1982 .

[3]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[4]  Patrick Bouthemy,et al.  Motion-Based Feature Extraction and Ascendant Hierarchical Classification for Video Indexing and Retrieval , 1999, VISUAL.

[5]  Alan P. Parkes,et al.  The Application of Video Semantics and Theme Representation in Automated Video Editing , 2004, Multimedia Tools and Applications.

[6]  C.-C. Jay Kuo,et al.  A semantic classification and composite indexing approach to robust image retrieval , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[7]  Vladimir Pavlovic,et al.  Dynamic bayesian networks for information fusion with applications to human-computer interfaces , 1999 .

[8]  Hong Heather Yu,et al.  Scenic classification methods for image and video databases , 1995, Other Conferences.

[9]  Jan J. Gerbrands,et al.  Transition region determination based thresholding , 1991, Pattern Recognit. Lett..

[10]  Gerald Millerson The technique of television production , 1961 .

[11]  Nuno Vasconcelos,et al.  Towards semantically meaningful feature spaces for the characterization of video content , 1997, Proceedings of International Conference on Image Processing.

[12]  John R. Kender,et al.  On the structure and analysis of home videos , 2000 .

[13]  C. Metz Film Language: A Semiotics of the Cinema , 1974 .

[14]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[15]  Edoardo Ardizzone,et al.  JACOB: just a content-based query system for video databases , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[17]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[18]  M. La Cascia,et al.  Motion and color-based video indexing and retrieval , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[19]  Godfried T. Toussaint,et al.  The use of context in pattern recognition , 1978, Pattern Recognit..

[20]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[21]  John Preston Isenhour The effects of context and order in film editing , 1975 .

[22]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[23]  Loong Fah Cheong,et al.  Framework for Synthesizing Semantic-Level Indices , 2003, Multimedia Tools and Applications.

[24]  Philippe Aigrain,et al.  Medium knowledge-based macro-segmentation of video into sequences , 1997 .

[25]  Junji Maeda,et al.  Representation and retrieval of video scene by using object actions and their spatio-temporal relationships , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[26]  Gozde Bozdagi Akar,et al.  Feature-based hierarchical video segmentation , 1997, Proceedings of International Conference on Image Processing.

[27]  Vladimir Pavlovic,et al.  Audio-visual speaker detection using dynamic Bayesian networks , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[28]  A. M. Alattar Wipe Scene Change Detector For Segmenting Uncompressed Video Sequences , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).

[29]  Marvin M. Chun,et al.  1 Temporal Contextual Cueing Title : Temporal contextual cueing of visual attention , 2001 .

[30]  Geoffrey Zweig,et al.  Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[31]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[32]  François G. Meyer Time-to-collision from first-order models of the motion field , 1994, IEEE Trans. Robotics Autom..

[33]  Claus Weihs,et al.  Dynamic Bayesian Networks for Classification of Business Cycles , 1999 .

[34]  Ali N. Akansu,et al.  Low-level motion activity features for semantic characterization of video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[36]  Alan P. Parkes,et al.  Filmic space-time diagrams for video structure representation , 1996, Signal Process. Image Commun..

[37]  Ann E. Nicholson,et al.  Dynamic Belief Networks for Discrete Monitoring , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[38]  Anil K. Jain,et al.  Incremental learning for Bayesian classification of images , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).