A PUBLIC AUDIO IDENTIFICATION EVALUATION FRAMEWORK FOR BROADCAST MONITORING

This paper presents the first public framework for the evaluation of audio fingerprinting techniques. Although the domain of audio identification is very active, both in the industry and the academic world, there is at present no common basis to compare the proposed techniques. This is because corpuses and evaluation protocols differ among the authors. The framework we present here corresponds to a use-case in which audio excerpts have to be detected in a radio broadcast stream. This scenario, indeed, naturally provides a large variety of audio distortions that makes this task a real challenge for fingerprinting systems. Scoring metrics are discussed with regard to this particular scenario. We then describe a whole evaluation framework including an audio corpus, together with the related groundtruth annotation, and a toolkit for the computation of the score metrics. An example of an application of this framework is finally detailed, that took place during the evaluation campaign of the Quaero project. This evaluation framework is publicly available for download and constitutes a simple, yet thorough, platform that can be used by the community in the field of audio identification to encourage reproducible results.

[1]  Mehryar Mohri,et al.  Efficient and Robust Music Identification With Weighted Finite-State Transducers , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Jürgen Herre,et al.  Robust matching of audio signals using spectral flatness features , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[3]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[4]  John C. Platt,et al.  Distortion discriminant analysis for audio fingerprinting , 2003, IEEE Trans. Speech Audio Process..

[5]  Gaël Richard,et al.  Comparison of different strategies for a SVM-based audio segmentation , 2009, 2009 17th European Signal Processing Conference.

[6]  Lie Lu,et al.  Highlight sound effects detection in audio stream , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[7]  Ton Kalker,et al.  Pairwise Boosted Audio Fingerprint , 2009, IEEE Transactions on Information Forensics and Security.

[8]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[9]  Pedro J. Moreno,et al.  Music Identification with Weighted Finite-State Transducers , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Eric Allamanche,et al.  Content-based Identification of Audio Material Using MPEG-7 Low Level Description , 2001, ISMIR.

[11]  Geoffroy Peeters,et al.  Audio identification based on spectral modeling of bark-bands energy and synchronization through onset detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Helmut Neuschmied,et al.  Robust Sound Modeling for Song Detection in Broadcast Audio , 2002 .

[13]  Helmut Neuschmied,et al.  Identification of Audio Titles on Internet , 2001 .

[14]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[15]  Kunio Kashino,et al.  Quick audio retrieval using active search , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[16]  Shumeet Baluja,et al.  Known-Audio Detection using Waveprint: Spectrogram Fingerprinting by Wavelet Hashing , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Jean-Bernard Rault,et al.  Audio Identification Using Sinusoidal Modeling and Application to Jingle Detection , 2007, ISMIR.

[18]  Chang Dong Yoo,et al.  Boosted Binary Audio Fingerprint Based on Spectral Subband Moments , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[19]  Julien Pinquier,et al.  Jingle detection and identification in audio documents , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Seungjae Lee,et al.  Audio fingerprinting based on normalized spectral subband moments , 2006, IEEE Signal Processing Letters.

[21]  Xiangyang Xue,et al.  Robust audio identification for MP3 popular music , 2010, SIGIR '10.

[22]  Liu Gang,et al.  Improved Algorithms of Music Information Retrieval Based on Audio Fingerprint , 2009, 2009 Third International Symposium on Intelligent Information Technology Application Workshops.

[23]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[24]  John C. Platt,et al.  Extracting noise-robust features from audio data , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Trieu-Kien Truong,et al.  Audio classification and categorization based on wavelets and support vector Machine , 2005, IEEE Transactions on Speech and Audio Processing.

[26]  Gianluca Mazzini,et al.  A Framework for Robust Audio Fingerprinting , 2010, J. Commun..

[27]  Hui Lin,et al.  Generalized Time-Series Active Search With Kullback–Leibler Distance for Audio Fingerprinting , 2006, IEEE Signal Processing Letters.

[28]  Yu Liu,et al.  Audio Fingerprinting Based on Multiple Hashing in DCT Domain , 2009, IEEE Signal Processing Letters.

[29]  Richard E. Grandy,et al.  Orlando, Florida, USA , 2011 .

[30]  Xavier Rodet,et al.  Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.