Implementation of a matching engine for a practical query-by-singing/humming system

This paper proposes a matching engine of a query-by-singing/humming (QbSH) system of which database is constructed from polyphonic recordings such as MP3 files. Use of the database makes the system more practical since it saves the trouble of gathering MIDI files. The pitch sequences transcribed from polyphonic recordings may have errors, and to reduce the influence of the errors, the matching engine uses chroma-scale representation, compensation, and asymmetric dynamic time warping. We propose the use of saturated distances, and it is verified that the distances perform better then generally-used absolute difference and squared difference. In our experiment, our QbSH system achieved mean reciprocal rank of 0.725 for 1000 singing/ humming queries when searching from a database of 28 hour audio data.

[1]  Sang Ryong Kim,et al.  A spectrally mixed excitation (SMX) vocoder with robust parameter determination , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Daniel P. W. Ellis,et al.  Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Changshui Zhang,et al.  Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[5]  Chang Dong Yoo,et al.  Music genre classification using novel features and a weighted voting method , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[6]  Jyh-Shing Roger Jang,et al.  A Query-by-Singing System based on Dynamic Programming , 2000 .

[7]  Remco C. Veltkamp,et al.  A Survey of Music Information Retrieval Systems , 2005, ISMIR.

[8]  J. Stephen Downie,et al.  Music information retrieval , 2005, Annu. Rev. Inf. Sci. Technol..

[9]  Seungjae Lee,et al.  Audio fingerprinting based on normalized spectral subband moments , 2006, IEEE Signal Processing Letters.

[10]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[11]  Andreas Nürnberger,et al.  Towards Query by Singing/Humming on Audio Databases , 2007, ISMIR.

[12]  Hsin-Min Wang,et al.  A Query-by-Singing System for Retrieving Karaoke Music , 2008, IEEE Transactions on Multimedia.

[13]  Ton Kalker,et al.  Distance Metric Learning for Content Identification , 2010, IEEE Transactions on Information Forensics and Security.

[14]  Chao-Ling Hsu,et al.  Simple But Effective Methods for QBSH at MIREX 2006 , 2006 .

[15]  Anssi Klapuri,et al.  Query by humming of midi and audio using locality sensitive hashing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Dennis Shasha,et al.  Warping indexes with envelope transforms for query by humming , 2003, SIGMOD '03.

[17]  Lei Wang,et al.  An effective and efficient method for query by humming system based on multi-similarity measurement fusion , 2008, 2008 International Conference on Audio, Language and Image Processing.

[18]  Malcolm D. Macleod,et al.  Particle Filtering Applied to Musical Tempo Tracking , 2004, EURASIP J. Adv. Signal Process..

[19]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[20]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[21]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Seokhwan Jo,et al.  Melody Extraction from Polyphonic Audio Based on Particle Filter , 2010, ISMIR.

[23]  Ton Kalker,et al.  Pairwise Boosted Audio Fingerprint , 2009, IEEE Transactions on Information Forensics and Security.

[24]  George Tzanetakis,et al.  Automatic Musical Genre Classification of Audio Signals , 2001, ISMIR.

[25]  Jyh-Shing Roger Jang,et al.  A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming , 2008, IEEE Transactions on Audio, Speech, and Language Processing.