Audio as a support to scene change detection and characterization of video sequences

A challenging problem to construct video databases is the organization of video information. The development of algorithms able to organize video information according to semantic content of the data is getting more and more important. This will allow algorithms such as indexing and retrieval to work more efficiently. Until now, an attempt to extract semantic information has been performed using only video information. As a video sequence is constructed from a 2-D projection of a 3-D scene, video processing has shown its limitations especially in solving problems such as object identification or object tracking, reducing the ability to extract semantic characteristics. A possibility to overcome the problem is to use additional information. The associated audio signal is then the most natural way to obtain this information. This paper presents a technique which combines video and audio information together for classification and indexing purposes. The classification is performed on the audio signal; a general framework that uses the results of such classification is then proposed for organizing video information.