Incorporating Domain Knowledge with Video and Voice Data Analysis in News Broadcasts

This paper addresses the area of video annotation, indexing and retrieval, and shows how a set of tools can be employed, along with domain knowledge, to detect narrative structure in broadcast news. The initial structure is detected using low-level audio visual processing in conjunction with domain knowledge. Higher level processing may then utilize the initial structure detected to direct processing to improve and extend the initial classification. The structure detected breaks a news broadcast into segments, each of which contains a single topic of discussion. Further the segments are labeled as a) anchor person or reporter, b) footage with a voice over or c) sound bite. This labeling may be used to provide a summary, for example by presenting a thumbnail for each reporter present in a section of the video. The inclusion of domain knowledge in computation allows more directed application of high level processing, giving much greater efficiency of effort expended. This allows valid deductions to be made about structure and semantics of the contents of a news video stream, as demonstrated by our experiments on CNN news broadcasts.

[1]  David M. Shotton,et al.  Object tracking and event recognition in biological microscopy videos , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[2]  Boon-Lock Yeo,et al.  Video Query and Retrieval , 1997, Australian Joint Conference on Artificial Intelligence.

[3]  Boon-Lock Yeo,et al.  Classification, simplification, and dynamic visualization of scene transition graphs for video browsing , 1997, Electronic Imaging.

[4]  Yoshinobu Tonomura,et al.  VideoMAP and VideoSpaceIcon: tools for anatomizing video content , 1993, INTERCHI.

[5]  Ramin Zabih,et al.  Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[6]  Alberto Del Bimbo,et al.  Image retrieval by color semantics , 1999, Multimedia Systems.

[7]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[8]  Ioannis Pitas,et al.  Audio-visual content analysis for content-based video indexing , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[9]  Wolfgang Effelsberg,et al.  Scene Determination Based on Video and Audio Features , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[10]  Marc Davis,et al.  Media Streams: an iconic visual language for video annotation , 1993, Proceedings 1993 IEEE Symposium on Visual Languages.

[11]  S. Abe,et al.  Content oriented visual interface using video icons for visual database systems , 1989, [Proceedings] 1989 IEEE Workshop on Visual Languages.

[12]  Riccardo Leonardi,et al.  Audio as a support to scene change detection and characterization of video sequences , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Dragutin Petkovic,et al.  Towards robust features for classifying audio in the CueVideo system , 1999, MULTIMEDIA '99.

[14]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Proceedings of International Conference on Image Processing.

[15]  Herbert Gish,et al.  Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Adam Cheyer,et al.  MVIEWS: multimodal tools for the video analyst , 1998, IUI '98.

[17]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Karen Spärck Jones,et al.  Acoustic indexing for multimedia retrieval and browsing , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Boon-Lock Yeo,et al.  Video browsing using clustering and scene transitions on compressed sequences , 1995, Electronic Imaging.

[20]  Thom Blum,et al.  Audio databases with content-based retrieval , 1997 .

[21]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[22]  Hiroshi Hamada,et al.  Video Handling with Music and Speech Detection , 1998, IEEE Multim..