Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases

We present a novel statistical model for dynamics of various singing behaviors, such as vibrato and overshoot, in a fundamental frequency (F0) contour. These dynamics are the important cues for perceiving individuality of a singer, and can be a useful measure for various applications, such as singing skill evaluation and singing voice synthesis. While most previous studies have modeled the dynamics using a second-order linear system, the automatic and accurate estimation of model parameters has yet to be accomplished. In this paper, we first develop a complete stochastic representation of the second-order system with Gaussian processes from parametric discretization, and propose a complete, efficient scheme for parameter estimation using the Expectation-Maximization (EM) algorithm. Experimental results show that the proposed method can decompose an F0 contour into a musical component and a dynamics component. Finally, we discuss estimating singing styles from the model parameters for each singer. Index Terms: Singing voices, Fundamental frequency (F0), Gaussian Processes, EM algorithm, Singing voice synthesis

[1]  Hirokazu Kameoka,et al.  Automatic Identification for Singing Style based on Sung Melodic Contour Characterized in Phase Plane , 2009, ISMIR.

[2]  Michael A. Casey,et al.  You Call That Singing? Ensemble Classification for Multi-Cultural Collections of Music Recordings , 2009, ISMIR.

[3]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[4]  Keikichi Hirose,et al.  Prosodic Modeling of Nagauta Singing and Its Evaluation , 2004 .

[5]  Hirokazu Kameoka,et al.  Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Masataka Goto,et al.  VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION , 2009 .

[7]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[8]  Alex Loscos,et al.  Sample-based singing voice synthesizer by spectral concatenation , 2003 .

[9]  Masataka Goto,et al.  An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features , 2006, INTERSPEECH.

[10]  Masataka Goto,et al.  Acoustic and perceptual effects of vocal training in amateur male singing , 2009, INTERSPEECH.

[11]  Masataka Goto,et al.  Speech-to-Singing Synthesis: Converting Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[12]  Hirokazu Kameoka,et al.  Parameter estimation method of F0 control model for singing voices , 2008, INTERSPEECH.

[13]  Johan Sundberg,et al.  The KTH Synthesis of Singing , 2006 .

[14]  Satoru Fukayama,et al.  Orpheus: Automatic Composition System Considering Prosody of Japanese Lyrics , 2009, ICEC.

[15]  Ning Hu,et al.  A comparative evaluation of search techniques for query-by-humming using the MUSART testbed , 2007, J. Assoc. Inf. Sci. Technol..

[16]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.