Automatic audio content analysis

This paper describes the theoretic framework and applications of automatic audio content analysis. Research in multimedia content analysis has so far concentrated on the video domain. We demonstrate the strength of automatic audio content analysis. We explain the algorithms we use, including analysis of amplitude, frequency and pitch, and simulations of human audio perception. These algorithms serve us as tools for further audio content analysis. We use these tools in applications like the segmentation of audio data streams into logical units for further processing, the analysis of music, as well as the recognition of sounds indicative of violence like shots, explosions and cries.

[1]  Ramin Zabih,et al.  A feature-based algorithm for detecting and classifying scene breaks , 1995, MULTIMEDIA '95.

[2]  Alon Fishbach,et al.  Primary segmentation of auditory scenes , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 2 - Conference B: Computer Vision & Image Processing. (Cat. No.94CH3440-5).

[3]  Stephen W. Smoliar In search of musical events , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 2 - Conference B: Computer Vision & Image Processing. (Cat. No.94CH3440-5).

[4]  G. Carlson Signal and Linear System Analysis , 1992 .

[5]  Remi Depommier,et al.  Content-based browsing of video sequences , 1994, MULTIMEDIA '94.

[6]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[7]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[8]  H. Olson The Measurement of Loudness , 1972 .

[9]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[10]  Yihong Gong,et al.  Automatic parsing of news video , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[11]  R. Plomp Pitch of complex tones. , 1966, The Journal of the Acoustical Society of America.

[12]  Alan R. Jones,et al.  Fast Fourier Transform , 1970, SIGP.

[13]  Wolfgang Effelsberg,et al.  Abstracting Digital Movies Automatically , 1996, J. Vis. Commun. Image Represent..

[14]  John A. Molino Pure-tone equal-loudness contours for standard tones of different frequencies , 1973 .

[15]  Rainer Lienhart,et al.  Automatic text recognition in digital videos , 1995, Electronic Imaging.

[16]  Stephen W. Smoliar,et al.  A video parsing, indexing and retrieval system , 1995, MULTIMEDIA '95.

[17]  Wolfgang Effelsberg,et al.  The MoCA Workbench: support for creativity in movie content analysis , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[18]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[19]  John A. Molino Pure-tone equal-loudness contours for standard tones of different frequencies , 1971 .

[20]  Michael G. Christel,et al.  Automating the creation of a digital video library , 1995, MULTIMEDIA '95.

[21]  J. Zwislocki Temporal summation of loudness: an analysis. , 1969, The Journal of the Acoustical Society of America.

[22]  Juan G. Roederer,et al.  Introduction to the physics and psychophysics of music , 1973 .

[23]  S. S. Stevens,et al.  Critical Band Width in Loudness Summation , 1957 .

[24]  R. Meddis Simulation of mechanical to neural transduction in the auditory receptor. , 1986, The Journal of the Acoustical Society of America.