Melody Extraction from Polyphonic Audio of Western Opera: A Method based on Detection of the Singer's Formant

Current melody extraction approaches perform poorly on the genre of opera [1, 2]. The singer’s formant is defined as a prominent spectral-envelope peak around 3 kHz found in the singing of professional Western opera singers [3]. In this paper we introduce a novel melody extraction algorithm based on this feature for opera signals. At the front end, it automatically detects the singer’s formant according to the Long-Term Average Spectrum (LTAS). This detection function is also applied to the short-term spectrum in each frame to determine the melody. The Fan Chirp Transform (FChT) [4] is used to compute pitch salience as its high time-frequency resolution overcomes th e difficulties introduced by vibrato. Subharmonic attenuation is adopted to handle octave errors which are comm on in opera vocals. We improve the FChT algorithm so that it is capable of correcting outliers in pitch detection. The performance of our method is compared to 5 state-ofthe-art melody extraction algorithms on a newly created dataset and parts of the ADC2004 dataset. Our algorithm achieves an accuracy of 87.5% in singer’s formant detection. In the evaluation of melody extraction, it has the best performance in voicing detection (91.6%), voicing false alarm (5.3%) and overall accuracy (82.3%).

[1]  Luis Weruaga,et al.  The fan-chirp transform for non-stationary harmonic signals , 2007, Signal Process..

[2]  J. Sundberg Articulatory interpretation of the "singing formant". , 1974, The Journal of the Acoustical Society of America.

[3]  Emilia Gómez,et al.  Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Daniel P. W. Ellis,et al.  Melody Extraction from Polyphonic Music Signals: Approaches, applications, and challenges , 2014, IEEE Signal Processing Magazine.

[5]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Geoffroy Peeters,et al.  Singing voice detection in music tracks using direct voice vibrato detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  J. Sundberg,et al.  Speech , Music and Hearing Quarterly Progress and Status Report Level and center frequency of the singer ’ s formant , 2007 .

[8]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Brian B. Monson,et al.  High-frequency energy in singing and speech , 2011 .

[10]  W. S. Brown,et al.  Singer's formant in sopranos: fact or fiction? , 2001, Journal of Voice.

[11]  Daniel P. W. Ellis,et al.  A Quantitative Comparison of Different Approaches for Melody Extraction from Polyphonic Audio Recordings , 2006 .

[12]  Martín Rocamora,et al.  FAN CHIRP TRANSFORM FOR MUSIC REPRESENTATION , 2010 .

[13]  J. Sundberg,et al.  Acoustical study of classical Peking Opera singing. , 2012, Journal of voice : official journal of the Voice Foundation.

[14]  Emmanuel Vincent,et al.  Predominant-F0 estimation using Bayesian harmonic waveform models , 2005 .