Improving accompanied Flamenco singing voice transcription by combining vocal detection and predominant melody extraction

While recent approaches to automatic voice melody transcription of accompanied flamenco singing give promising results regarding pitch accuracy, mistakenly transcribed guitar sections represent a major limitation for the obtained overall precision. With the aim of reducing the amount of false positives in the voicing detection, we propose a fundamental frequency contour estimation method which extends the pitch-salience based predominant melody extraction [3] with a vocal detection classifier based on timbre and pitch contour characteristics. Pitch contour segments estimated by the predominant melody extraction algorithm containing a high percentage of frames classified as non-vocal are rejected. After estimating the tuning frequency, the remaining pitch contour is segmented into single note events in an iterative approach. The resulting symbolic representations are evaluated with respect to manually corrected transcriptions on a frame-by-frame level. For two small flamenco dataset covering a variety of singers and audio quality, we observe a significant reduction of the voicing false alarm rate and an improved voicing F-Measure as well as an increased overall transcription accuracy. We furthermore demonstrate the advantage of vocal detection model trained on genre-specific material. The presented case study is limited to the transcription of Flamenco singing, but the general framework can be extended to other styles with genre-specific instrumentation.

[1]  A. Caballero,et al.  El cante flamenco , 1994 .

[2]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[3]  Perfecto Herrera,et al.  Comparing audio descriptors for singing voice detection in music audio files , 2007 .

[4]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[5]  Haizhou Li,et al.  On fusion of timbre-motivated features for singing voice detection and singer identification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Hiromasa Fujihara,et al.  Timbre and Melody Features for the Recognition of Vocal Activity and Instrumental Solos in Polyphonic Music , 2011, ISMIR.

[8]  Preeti Rao,et al.  Context-Aware Features for Singing Voice Detection in Polyphonic Music , 2011, Adaptive Multimedia Retrieval.

[9]  Jordi Bonada,et al.  Predominant Fundamental Frequency Estimation vs Singing Voice Separation for the Automatic Transcription of Accompanied Flamenco Singing , 2012, ISMIR.

[10]  Jean-Luc Rouas,et al.  Exploiting Semantic Content for Singing Voice Detection , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[11]  Emilia Gómez,et al.  Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Alvaro Pardo,et al.  Separation and Classification of Harmonic Sounds for Singing Voice Detection , 2012, CIARP.

[13]  Nadine Kroher,et al.  The Flamenco Cante : Automatic Characterization of Flamenco Singing by Analyzing Audio Recordings , 2013 .

[14]  Emilia Gómez,et al.  Towards Computer-Assisted Flamenco Transcription: An Experimental Comparison of Automatic Transcription Algorithms as Applied to A Cappella Singing , 2013, Computer Music Journal.