Robust query-by-singing/humming system against background noise environments

Under background noise environments, the performance of the Query-by-Singing/Humming (QbSH) system is considerably degraded. Since human pitch information is used as a feature vector for the QbSH system, a noise robust pitch estimation algorithm is inevitable. Thus, a novel pitch-estimation method is proposed by integrating temporal-autocorrelation and spectral-salience methods. As a pre-processing block, spectral smoothing is applied to enhance the stationarity of the noisy input signal. To calculate the similarity between the MIDI database and input humming signal, the dynamic time warping (DTW) algorithm is used. Jang's corpus and AURORA2 database are selected as humming and background noise signals, respectively. Compared with the standard pitch estimation algorithm in the ITU-T G.729 speech codec, the proposed pitch estimation method improves the average accuracy by 11.7% for the 0 dB signal-to-noise ratio (SNR) noise case. It also improves top-20 ratio and mean reciprocal rank (MRR) of the proposed QbSH system, on average, by 7.4% and 0.13, respectively.

[1]  Andreas Kornstädt,et al.  Themefinder: A web-based melodic search tool , 1998 .

[2]  Andreas Spanias,et al.  Cepstrum-based pitch detection using a new statistical V/UV classification algorithm , 1999, IEEE Trans. Speech Audio Process..

[3]  Akinori Ito,et al.  An effective music information retrieval method using three-dimensional continuous DP , 2006, IEEE Transactions on Multimedia.

[4]  Remco C. Veltkamp,et al.  Using transportation distances for measuring melodic similarity , 2003, ISMIR.

[5]  Kyoungro Yoon,et al.  Sub-fingerprint masking for a robust audio fingerprinting system in a real-noise environment for portable consumer devices , 2010, 2010 Digest of Technical Papers International Conference on Consumer Electronics (ICCE).

[6]  Jhing-Fa Wang,et al.  The design of a speech interactivity embedded module and its applications for mobile consumer devices , 2008, IEEE Transactions on Consumer Electronics.

[7]  Kyoungro Yoon,et al.  Mid-Level Music Melody Representation of Polyphonic Audio for Query-by-Humming System , 2002, ISMIR.

[8]  Jyh-Shing Roger Jang,et al.  A Query-by-Singing System based on Dynamic Programming , 2000 .

[9]  David De Roure,et al.  A tool for content based navigation of music , 1998, MULTIMEDIA '98.

[10]  Lutz Prechelt,et al.  An interface for melody input , 2001, TCHI.

[11]  Anssi Klapuri,et al.  Query by humming of midi and audio using locality sensitive hashing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[13]  Sang Ryong Kim,et al.  A spectrally mixed excitation (SMX) vocoder with robust parameter determination , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  Ian H. Witten,et al.  The New Zealand Digital Library MELody inDEX , 1997, D Lib Mag..

[15]  Jian Liu,et al.  A Top-down Approach to Melody Match in Pitch Contour for Query by Humming , 2006 .

[16]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[17]  Seok-Pil Lee,et al.  A new query-by-humming system based on the score level fusion of two classifiers , 2012, Int. J. Commun. Syst..

[18]  Remco C. Veltkamp,et al.  A Survey of Music Information Retrieval Systems , 2005, ISMIR.

[19]  Kang Ryoung Park,et al.  Intelligent query by humming system based on score level fusion of multiple classifiers , 2011, EURASIP J. Adv. Signal Process..

[20]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[21]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[22]  Xuan Zhu,et al.  An integrated music recommendation system , 2006, IEEE Transactions on Consumer Electronics.

[23]  Ian H. Witten,et al.  Towards the digital music library: tune retrieval from acoustic input , 1996, DL '96.

[24]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[25]  Jyh-Shing Roger Jang,et al.  An Improved Query by Singing/Humming System Using Melody and Lyrics Information , 2010, ISMIR.