论文信息 - Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases

Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases

We present a novel statistical model for dynamics of various singing behaviors, such as vibrato and overshoot, in a fundamental frequency (F0) contour. These dynamics are the important cues for perceiving individuality of a singer, and can be a useful measure for various applications, such as singing skill evaluation and singing voice synthesis. While most previous studies have modeled the dynamics using a second-order linear system, the automatic and accurate estimation of model parameters has yet to be accomplished. In this paper, we first develop a complete stochastic representation of the second-order system with Gaussian processes from parametric discretization, and propose a complete, efficient scheme for parameter estimation using the Expectation-Maximization (EM) algorithm. Experimental results show that the proposed method can decompose an F0 contour into a musical component and a dynamics component. Finally, we discuss estimating singing styles from the model parameters for each singer. Index Terms: Singing voices, Fundamental frequency (F0), Gaussian Processes, EM algorithm, Singing voice synthesis

Hirokazu Kameoka | Kunio Kashino | Daichi Mochihashi | Yasunori Ohishi | Hidehisa Nagano

[1] Hirokazu Kameoka,et al. Automatic Identification for Singing Style based on Sung Melodic Contour Characterized in Phase Plane , 2009, ISMIR.

[2] Michael A. Casey,et al. You Call That Singing? Ensemble Classification for Multi-Cultural Collections of Music Recordings , 2009, ISMIR.

[3] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[4] Keikichi Hirose,et al. Prosodic Modeling of Nagauta Singing and Its Evaluation , 2004 .

[5] Hirokazu Kameoka,et al. Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] Masataka Goto,et al. VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION , 2009 .

[7] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[8] Alex Loscos,et al. Sample-based singing voice synthesizer by spectral concatenation , 2003 .

[9] Masataka Goto,et al. An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features , 2006, INTERSPEECH.

[10] Masataka Goto,et al. Acoustic and perceptual effects of vocal training in amateur male singing , 2009, INTERSPEECH.

[11] Masataka Goto,et al. Speech-to-Singing Synthesis: Converting Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[12] Hirokazu Kameoka,et al. Parameter estimation method of F0 control model for singing voices , 2008, INTERSPEECH.

[13] Johan Sundberg,et al. The KTH Synthesis of Singing , 2006 .

[14] Satoru Fukayama,et al. Orpheus: Automatic Composition System Considering Prosody of Japanese Lyrics , 2009, ICEC.

[15] Ning Hu,et al. A comparative evaluation of search techniques for query-by-humming using the MUSART testbed , 2007, J. Assoc. Inf. Sci. Technol..

[16] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.