Local summarization and multi-level LSH for retrieving multi-variant audio tracks

In this paper we study the problem of detecting and grouping multi-variant audio tracks in large audio datasets. To address this issue, a fast and reliable retrieval method is necessary. But reliability requires elaborate representations of audio content, which challenges fast retrieval by similarity from a large audio database. To find a better tradeoff between retrieval quality and efficiency, we put forward an approach relying on local summarization and multi-level Locality-Sensitive Hashing (LSH). More precisely, each audio track is divided into multiple Continuously Correlated Periods (CCP) of variable length according to spectral similarity. The description for each CCP is calculated based on its Weighted Mean Chroma (WMC). A track is thus represented as a sequence of WMCs. Then, an adapted two-level LSH is employed for efficiently delineating a narrow relevant search region. The "coarse" hashing level restricts search to items having a non-negligible similarity to the query. The subsequent, "refined" level only returns items showing a much higher similarity. Experimental evaluations performed on a real multi-variant audio dataset confirm that our approach supports fast and reliable retrieval of audio track variants.

[1]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[2]  Rangachar Kasturi,et al.  Machine vision , 1995 .

[3]  Daniel P. W. Ellis,et al.  Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Mark Sandler,et al.  Efficient Multidimensional Searching Routines , 2001, ISMIR.

[5]  Shiyan Hu,et al.  Efficient video retrieval by locality sensitive hashing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Marc Leman,et al.  Using fuzzy logic to handle the semantic descriptions of music in a content-based retrieval system , 2006 .

[7]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[8]  Michael A. Casey,et al.  Song Intersection by Approximate Nearest Neighbor Search , 2006, ISMIR.

[9]  Lei Chen,et al.  Using Exact Locality Sensitive Mapping to Group and Detect Audio-Based Cover Songs , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[10]  J. Stephen Downie,et al.  Efficient Query-by-Content Audio Retrieval by Locality Sensitive Hashing and Partial Sequence Comparison , 2008, IEICE Trans. Inf. Syst..

[11]  Juan Pablo Bello,et al.  Audio-Based Cover Song Retrieval Using Approximate Chord Sequences: Testing Shifts, Gaps, Swaps and Beats , 2007, ISMIR.

[12]  Hsin-Min Wang,et al.  Query-By-Example Technique for Retrieving Cover Versions of Popular Songs with Similar Melodies , 2005, ISMIR.

[13]  Cheng Yang,et al.  Efficient acoustic index for music retrieval with various degrees of similarity , 2002, MULTIMEDIA '02.

[14]  Alain de Cheveigné,et al.  Scalable Metadata and Quick Retrieval of Audio Signals , 2005, ISMIR.

[15]  Yannis Manolopoulos,et al.  Audio Indexing for Efficient Music Information Retrieval , 2005, 11th International Multimedia Modelling Conference.

[16]  Riccardo Miotto,et al.  A Methodology for the Segmentation and Identification of Music Works , 2007, ISMIR.

[17]  Chiemi Watanabe,et al.  Towards a Fast and Efficient Match Algorithm for Content-Based Music Retrieval on Acoustic Data , 2005, ISMIR.

[18]  Peter Knees,et al.  Automatically Adapting the Structure of Audio Similarity Spaces , 2006 .

[19]  Xavier Serra,et al.  Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Haizhou Li,et al.  Music structure based vector space retrieval , 2006, SIGIR.

[21]  Ingo Mierswa,et al.  Understandable models Of music collections based on exhaustive feature generation with temporal statistics , 2006, KDD '06.

[22]  Heng Tao Shen,et al.  Exploring composite acoustic features for efficient music similarity query , 2006, MM '06.