Implementation of a practical query-by-singing/humming (QbSH) system and its commercial applications

In this paper, a practical query-bysinging/humming (QbSH) system is proposed that uses polyphonic music tracks such as MP3 and AAC files to create the reference database (DB) unlike conventional QbSH systems. To create the reference DB, we propose a method for melody extraction from polyphonic music signals based on harmonic structure. In addition, we propose a matching engine using modified dynamic time warping (DTW) that uses chroma-scale representation and asymmetric path of DTW to reduce the influence of melody extraction error. We implemented three different prototypes for its commercial applications like smart phone, laptop and karaoke. We evaluated the performance of the proposed practical QbSH system with monophonic and polyphonic music datasets, and confirmed that it has an acceptable performance for commercial applications.

[1]  Ian H. Witten,et al.  Towards the digital music library: tune retrieval from acoustic input , 1996, DL '96.

[2]  Jyh-Shing Roger Jang,et al.  An Improved Query by Singing/Humming System Using Melody and Lyrics Information , 2010, ISMIR.

[3]  Changshui Zhang,et al.  Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[5]  Seokhwan Jo,et al.  Melody Extraction from Polyphonic Audio Based on Particle Filter , 2010, ISMIR.

[6]  Hsin-Min Wang,et al.  A Query-by-Singing System for Retrieving Karaoke Music , 2008, IEEE Transactions on Multimedia.

[7]  Jyh-Shing Roger Jang,et al.  A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Martin J. Wainwright,et al.  On divergences, surrogate loss functions, and decentralized detection , 2005, ArXiv.

[9]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[10]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Sang Ryong Kim,et al.  A spectrally mixed excitation (SMX) vocoder with robust parameter determination , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Masataka Goto A Predominant-F0 Estimation Method for Real-world Musical Audio Signals: MAP Estimation for Incorporating Prior Knowledge about F0s and Tone Models , 2001 .

[13]  Karin Dressler AUDIO MELODY EXTRACTION FOR MIREX 2009 , 2009 .

[14]  Andreas Nürnberger,et al.  Towards Query by Singing/Humming on Audio Databases , 2007, ISMIR.

[15]  Kichul Kim,et al.  Robust query-by-singing/humming system against background noise environments , 2011, IEEE Transactions on Consumer Electronics.

[16]  Jyh-Shing Roger Jang,et al.  A Query-by-Singing System based on Dynamic Programming , 2000 .

[17]  Anssi Klapuri,et al.  Query by humming of midi and audio using locality sensitive hashing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Dennis Shasha,et al.  Warping indexes with envelope transforms for query by humming , 2003, SIGMOD '03.

[19]  Masataka Goto,et al.  A robust predominant-F0 estimation method for real-time detection of melody and bass lines in CD recordings , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[20]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.