Efficient Index-Based Audio Matching

Given a large audio database of music recordings, the goal of classical audio identification is to identify a particular audio recording by means of a short audio fragment. Even though recent identification algorithms show a significant degree of robustness towards noise, MP3 compression artifacts, and uniform temporal distortions, the notion of similarity is rather close to the identity. In this paper, we address a higher level retrieval problem, which we refer to as audio matching: given a short query audio clip, the goal is to automatically retrieve all excerpts from all recordings within the database that musically correspond to the query. In our matching scenario, opposed to classical audio identification, we allow semantically motivated variations as they typically occur in different interpretations of a piece of music. To this end, this paper presents an efficient and robust audio matching procedure that works even in the presence of significant variations, such as nonlinear temporal, dynamical, and spectral deviations, where existing algorithms for audio identification would fail. Furthermore, the combination of various deformation- and fault-tolerance mechanisms allows us to employ standard indexing techniques to obtain an efficient, index-based matching procedure, thus providing an important step towards semantically searching large-scale real-world music collections.

[1]  Frank Kurth,et al.  Full-Text Indexing of Very Large Audio Data Bases , 2001 .

[2]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[3]  Meinard Müller,et al.  Towards Structural Analysis of Audio Recordings in the Presence of Musical Variations , 2007, EURASIP J. Adv. Signal Process..

[4]  Frank Kurth,et al.  Identification of Highly Distorted Audio Material for Querying Large Scale Data Bases , 2002 .

[5]  Ning Hu,et al.  Polyphonic Audio Matching for Score Following and Intelligent Audio Editors , 2003, ICMC.

[6]  Gerhard Widmer,et al.  MATCH: A Music Alignment Tool Chest , 2005, ISMIR.

[7]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[8]  Frank Kurth,et al.  A unified approach to content-based and fault-tolerant music recognition , 2004, IEEE Transactions on Multimedia.

[9]  Michael A. Casey,et al.  Song Intersection by Approximate Nearest Neighbor Search , 2006, ISMIR.

[10]  Gonzalo Navarro,et al.  Large text searching allowing errors , 1997 .

[11]  Meinard Müller,et al.  An Efficient Multiscale Approach to Audio Synchronization , 2006, ISMIR.

[12]  Haim J. Wolfson,et al.  Geometric hashing: an overview , 1997 .

[13]  George Tzanetakis,et al.  Polyphonic audio matching and alignment for music retrieval , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[14]  Michael A. Casey,et al.  The Importance of Sequences in Musical Similarity , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[16]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[17]  Samuel R. Buss,et al.  Spherical averages and applications to spherical splines and interpolation , 2001, TOGS.

[18]  Ning Hu,et al.  Pattern Discovery Techniques for Music Audio , 2002, ISMIR.

[19]  Ian H. Witten,et al.  Managing gigabytes , 1994 .

[20]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[21]  Daniel P. W. Ellis,et al.  Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22]  Emilia Gómez,et al.  The song remains the same: identifying versions of the same piece using tonal descriptors , 2006, ISMIR.

[23]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[24]  Pedro Cano,et al.  A review of algorithms for audio fingerprinting , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[25]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System With an Efficient Search Strategy , 2003 .

[26]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[27]  Shumeet Baluja,et al.  Learning "Forgiving" Hash Functions: Algorithms and Large Scale Tests , 2007, IJCAI.

[28]  Michael A. Casey,et al.  Fast Recognition of Remixed Music Audio , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[29]  Meinard Müller,et al.  Audio Matching via Chroma-Based Statistical Features , 2005, ISMIR.

[30]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[31]  Ton Kalker,et al.  Audio Fingerprinting In Peer-to-peer Networks , 2004, ISMIR.

[32]  Wei Chai,et al.  Semantic segmentation and summarization of music: methods based on tonality and recurrent structure , 2006, IEEE Signal Processing Magazine.

[33]  Gerhard Widmer,et al.  Improvements of Audio-Based Music Similarity and Genre Classificaton , 2005, ISMIR.

[34]  Jürgen Herre,et al.  AudioID: Towards Content-Based Identification of Audio Material , 2001 .

[35]  Frank Kurth,et al.  A unified approach to content-based and fault tolerant music identification , 2002, Second International Conference on Web Delivering of Music, 2002. WEDELMUSIC 2002. Proceedings..

[36]  Meinard Müller,et al.  Enhancing Similarity Matrices for Music Audio Analysis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.