Multi-Version Music Search Using Acoustic Feature Union and Exact Soft Mapping

Research on audio-based music retrieval has primarily concentrated on refining audio features to improve search quality. However, much less work has been done on improving the time efficiency of music audio searches. Representing music audio documents in an indexable format provides a mechanism for achieving efficiency. To address this issue, in this work Exact Locality Sensitive Mapping (ELSM) is suggested to join the concatenated feature sets and soft hash values. On this basis we propose audio-based music indexing techniques, ELSM and Soft Locality Sensitive Hash (SoftLSH) using an optimized Feature Union (FU) set of extracted audio features. Two contributions are made here. First, the principle of similarity-invariance is applied in summarizing audio feature sequences and utilized in training semantic audio representations based on regression. Second, soft hash values are pre-calculated to help locate the searching range more accurately and improve collision probability among features similar to each other. Our algorithms are implemented in a demonstration system to show how to retrieve and evaluate multi-version audio documents. Experimental evaluation over a real "multi-version" audio dataset confirms the practicality of ELSM and SoftLSH with FU and proves that our algorithms are effective for both multi-version detection (online query, one-query vs. multi-object) and same content detection (batch queries, multi-queries vs. one-object).

[1]  Daniel P. W. Ellis,et al.  Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[3]  Alain de Cheveigné,et al.  Scalable Metadata and Quick Retrieval of Audio Signals , 2005, ISMIR.

[4]  Yannis Manolopoulos,et al.  Audio Indexing for Efficient Music Information Retrieval , 2005, 11th International Multimedia Modelling Conference.

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  Rina Panigrahy,et al.  Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.

[7]  Michael A. Casey,et al.  Song Intersection by Approximate Nearest Neighbor Search , 2006, ISMIR.

[8]  J.F. Serrano,et al.  Music Motive Extraction Through Hanson Intervallic Analysis , 2006, 2006 15th International Conference on Computing.

[9]  Hsin-Min Wang,et al.  Query-By-Example Technique for Retrieving Cover Versions of Popular Songs with Similar Melodies , 2005, ISMIR.

[10]  J. Stephen Downie,et al.  Efficient Query-by-Content Audio Retrieval by Locality Sensitive Hashing and Partial Sequence Comparison , 2008, IEICE Trans. Inf. Syst..

[11]  Juan Pablo Bello,et al.  Audio-Based Cover Song Retrieval Using Approximate Chord Sequences: Testing Shifts, Gaps, Swaps and Beats , 2007, ISMIR.

[12]  Julien Allali,et al.  Adaption of String Matching Algorithms for Identification of Near-Duplicate Music Documents , 2007, PAN.

[13]  Riccardo Miotto,et al.  A Methodology for the Segmentation and Identification of Music Works , 2007, ISMIR.

[14]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[15]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[16]  Kunio Kashino,et al.  Time-series active search for quick retrieval of audio and video , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  Marc Leman,et al.  Using fuzzy logic to handle the semantic descriptions of music in a content-based retrieval system , 2006 .

[18]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[19]  Peter Knees,et al.  Automatically Adapting the Structure of Audio Similarity Spaces , 2006 .

[20]  Xavier Serra,et al.  Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[22]  Cheng Yang,et al.  Efficient acoustic index for music retrieval with various degrees of similarity , 2002, MULTIMEDIA '02.

[23]  Haizhou Li,et al.  Music structure based vector space retrieval , 2006, SIGIR.

[24]  Ingo Mierswa,et al.  Understandable models Of music collections based on exhaustive feature generation with temporal statistics , 2006, KDD '06.

[25]  Heng Tao Shen,et al.  Exploring composite acoustic features for efficient music similarity query , 2006, MM '06.