Music identification via vocabulary tree with MFCC peaks

In this paper, a Vocabulary Tree based framework is proposed for music identification whose target is to recognize a fragment from a song database. The key to a high recognition precision within this framework is a novel feature, namely MFCC Peaks, which is a combination of MFCC and Spectral Peaks features. Our approach consists of three stages. We first build the Vocabulary Tree with 2 million MFCC Peaks features extracted from hundreds of music. Then each song in the database is quantified into some words by traveling from root down to a certain leaf. Given a query input, we apply the same quantization procedure to this fragment, score the archive according to the TF-IDF scheme and return the best matches. The experimental results demonstrate that our proposed feature has strong identifying and generalization ability. Other trials show that our approach scales well with the size of database. Further comparison also demonstrates that while our algorithm achieves approximately the same retrieval precision as other state-of-the-art methods, it cost less time and memory.

[1]  Gonzalo Navarro,et al.  Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences , 2002 .

[2]  Shumeet Baluja,et al.  Audio Fingerprinting: Combining Computer Vision & Data Stream Processing , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Derek Hoiem,et al.  Computer vision for music identification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Cheng Yang,et al.  Music Database Retrieval Based on Spectral Similarity , 2001 .

[6]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[7]  Christian Spevak,et al.  SOUNDSPOTTER – A PROTOTYPE SYSTEM FOR CONTENT-BASED AUDIO RETRIEVAL , 2002 .

[8]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[9]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[10]  Xiangyang Xue,et al.  Robust audio identification for MP3 popular music , 2010, SIGIR '10.

[11]  Pedro Cano,et al.  A Review of Audio Fingerprinting , 2005, J. VLSI Signal Process..

[12]  E. Batlle,et al.  Automatic Song Identification in Noisy Broadcast Audio , 2002 .

[13]  Jürgen Herre,et al.  AudioID: Towards Content-Based Identification of Audio Material , 2001 .

[14]  Jaap A. Haitsma,et al.  Robust Audio Hashing for Content Identification , 2001 .

[15]  Cheng Yang MACS: music audio characteristic sequence indexing for similarity retrieval , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[16]  Reginald L. Lagendijk,et al.  Stochastic Model of a Robust Audio Fingerprinting System , 2004, ISMIR.

[17]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Mehryar Mohri,et al.  Efficient and Robust Music Identification With Weighted Finite-State Transducers , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Chin-Long Wey,et al.  Compressed domain content-based retrieval of MP3 audio example using quantization tree indexing and melody-line tracking method , 2006, 2006 IEEE International Symposium on Circuits and Systems.