Classification and retrieval of sound effects in audiovisual data management

We present a method for the classification of sound effects which exploits time-frequency analysis of audio signals and uses the hidden Markov model as the classifier. The proposed approach can be used to retrieve audio/video segments in studios, audiovisual libraries, and family entertainment applications. For example, video scenes of a gun fight can be retrieved by searching for sounds of shooting or explosion. In addition, it will have applications in surveillance by recognizing sounds related to criminal activities. An accuracy rate of 86% for sound effects classification is achieved with the proposed method. Also, a query-by-example retrieval approach for sound effects is proposed on top of the archiving scheme, which is proved to be highly efficient and effective.

[1]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[2]  C.-C. Jay Kuo,et al.  Audio-guided audiovisual data segmentation, indexing, and retrieval , 1998, Electronic Imaging.

[3]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.