Efficient Vocal Melody Extraction from Polyphonic Music Signals

Melody extraction from polyphonic music is a valuable but difficult problem in music information retrieval. This paper proposes a system for automatic vocal melody extraction from polyphonic music recordings. Our approach is based on the pitch salience and the creation of the pitch contours. In the calculation of pitch salience, we reduce the peaks number of the spectral transform using a two-level filter and shrink the pitch range in accordance with the experiment to improve the efficiency of the system. In the singing voice detection, we adopt a three-step filter using the pitch contour characteristics and their distributions. The quantitative evaluation shows that our system not only keeps the overall accuracy compared with the state-of-the-art approaches submitted to MIREX, but also achieves high algorithm efficiency. DOI: http://dx.doi.org/10.5755/j01.eee.19.6.4575

[1]  Albert S. Bregman,et al.  Auditory scene analysis : hearing in complex environments , 1993 .

[2]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008 .

[3]  Amílcar Cardoso,et al.  Melody Detection in Polyphonic Musical Signals: Exploiting Perceptual Rules, Note Salience, and Melodic Smoothness , 2006, Computer Music Journal.

[4]  Emilia Gómez,et al.  Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Geoffroy Peeters,et al.  Singing voice detection in music tracks using direct voice vibrato detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Ning Hu,et al.  A comparative evaluation of search techniques for query-by-humming using the MUSART testbed , 2007, J. Assoc. Inf. Sci. Technol..

[7]  Karin Dressler,et al.  SINUSOIDAL EXTRACTION USING AN EFFICIENT IMPLEMENTATION OF A MULTI-RESOLUTION FFT , 2006 .

[8]  Julián Urbano,et al.  Current Challenges in the Evaluation of Predominant Melody Extraction Algorithms , 2012, ISMIR.

[9]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Rémi Gribonval,et al.  Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Emilia Gómez,et al.  Supplementary Graphs: Sinusoid Extraction and Salience Function Design for Predominant Melody Estimation , 2011 .

[12]  Jyh-Shing Roger Jang,et al.  Singing Pitch Extraction by Voice Vibrato / Tremolo Estimation and Instrument Partial Deletion , 2010, ISMIR.

[13]  Daniel P. W. Ellis,et al.  A Classification Approach to Melody Transcription , 2005, ISMIR.

[14]  Emmanuel Vincent,et al.  Predominant-F0 estimation using Bayesian harmonic waveform models , 2005 .

[15]  Preeti Rao,et al.  Vocal Melody Extraction in the Presence of Pitched Accompaniment in Polyphonic Music , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[17]  George Tzanetakis,et al.  A comparative evaluation of search techniques for query-by-humming using the MUSART testbed , 2007 .

[18]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.