论文信息 - Audio identification based on spectral modeling of bark-bands energy and synchronization through onset detection

Audio identification based on spectral modeling of bark-bands energy and synchronization through onset detection

In this paper, we present for the first time the fingerprint IRCAM system for audio identification in streams. The baseline system relies on a double-nested Short Time Fourier Transform. The first STFT computes the energies of a filter-bank, that are then modelled over 2 s, using a second STFT. We then present recent improvements of our system: first the inclusion of perceptual scales for amplitude and frequency (Bark bands), then the synchronization of stream and database frames using an onset detection system. The performance of these improvements is tested on a large set of real audio streams. We compare our results with the results of re-implementations of the two state-of-the-art systems of Philips and Shazam.

Geoffroy Peeters | Mathieu Ramona | G. Peeters | M. Ramona | Geoffroy Peeters

[1] Ton Kalker,et al. A Highly Robust Audio Fingerprinting System With an Efficient Search Strategy , 2003 .

[2] Eric Allamanche,et al. Content-based Identification of Audio Material Using MPEG-7 Low Level Description , 2001, ISMIR.

[3] E. Zwicker,et al. Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .

[4] France,et al. Onset Detection in Polyphonic Signals by means of Transient Peak Classification , 2005 .

[5] Xavier Rodet,et al. Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.

[6] Les E. Atlas,et al. EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .

[7] Edith Law,et al. Input-agreement: a new mechanism for collecting data using human computation games , 2009, CHI.

[8] Avery Wang,et al. An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[9] Steven Greenberg,et al. The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] Geoffroy Peeters,et al. Modeles et modification du signal sonore adaptes aux caracteristiques locales , 2001 .

[11] Ton Kalker,et al. A Highly Robust Audio Fingerprinting System , 2002, ISMIR.