Enhanced video handling based on audio analysis

Soundtracks of videos contain a rich source of content-based information. In this paper, we propose an audio-based approach to video indexing and handling. Audio data is analysed by means of frequency analysis, and music and voice are independently detected even if they occur together. The method is implemented on a system called Video in Time as an example of creating reasonable condensed versions of dramas or movies by excerpting meaningful video segments. Users can select the desired replaying time from several different levels, depending on how much time can be afforded for viewing. Detection rates for music and voice are evaluated and experiences with the system are mentioned.

[1]  Barry Arons Hands-on demonstration: interacting with SpeechSkimmer , 1995, UIST '95.

[2]  Yoshinobu Tonomura,et al.  VideoMAP and VideoSpaceIcon: tools for anatomizing video content , 1993, INTERCHI.

[3]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[4]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[5]  Percy H. Tannenbaum,et al.  Music background in the judgment of stage and television drama , 1956 .

[6]  Julian F. Thayer,et al.  Effects of music on psychophysiological responses to a stressful film. , 1983 .

[7]  S. Abe,et al.  Content oriented visual interface using video icons for visual database systems , 1989, [Proceedings] 1989 IEEE Workshop on Visual Languages.

[8]  Glorianna Davenport,et al.  Creating and Viewing the Elastic Charles - A Hypermedia Journal , 1989, UK Hypertext.

[9]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[10]  Yukinobu Taniguchi,et al.  An intuitive and efficient access interface to real-time incoming video based on automatic indexing , 1995, MULTIMEDIA '95.

[11]  Michael Mills,et al.  A magnifier tool for video data , 1992, CHI.

[12]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[13]  Walter Bender,et al.  Salient video stills: content and context preserved , 1993, MULTIMEDIA '93.

[14]  Remi Depommier,et al.  Content-based browsing of video sequences , 1994, MULTIMEDIA '94.

[15]  Yoshinobu Tonomura,et al.  Projection Detecting Filter for Video Cut Detection , 1993, ACM Multimedia.

[16]  Philippe Aigrain,et al.  Representation-based user interfaces for the audiovisual library of the year 2000 , 1995, Electronic Imaging.

[17]  M Boltz,et al.  Effects of background music on the remembering of filmed events , 1991, Memory & cognition.

[18]  Philippe Gelin,et al.  Keyword spotting for video soundtrack indexing , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[19]  Michael Hawley Structure out of sound , 1993 .

[20]  Hiroshi Hamada,et al.  An interface for sound browsing in video handling environment , 1995 .

[21]  Yukinobu Taniguchi,et al.  Structured Video Computing , 1994, IEEE MultiMedia.

[22]  Michael G. Christel,et al.  Automating the creation of a digital video library , 1995, MULTIMEDIA '95.

[23]  Glorianna Davenport,et al.  Video streamer , 1994, CHI Conference Companion.

[24]  Yoshinobu Tonomura,et al.  Video tomography: an efficient method for camerawork extraction and motion analysis , 1994, MULTIMEDIA '94.

[25]  Wolfgang Effelsberg,et al.  Automatic audio content analysis , 1997, MULTIMEDIA '96.