Automatic Identification for Singing Style based on Sung Melodic Contour Characterized in Phase Plane

A stochastic representation of singing styles is pro- posed. The dynamic property of melodic contour, i.e., fun- damental frequency (F0) sequence, is assumed to be the main cue for singing styles because it can characterize such typical ornamentations as vibrato . F0 signal trajectories in the phase plane are used as the basic representation. By fitting Gaussian mixture models to the observed F0 trajec- tories in the phase plane, a parametric representation is ob- tained by a set of GMM parameters. The effectiveness of our proposed method is confirmed through experimental evaluation where 94.1% accuracy for singer-class discrim- ination was obtained. these studies try to use the local dynamics of melodic con- tour as a cue for ornamentation, no systematic method has been proposed for characterizing singing styles. A lag system model for typical ornamentations was reported in (14,17-19); however, variation of singing styles was not discussed. In this paper, we propose a stochastic phase plane as a graphical representation of singing styles and show its effectiveness for singing style discrimination. One merit of this representation to characterize singing style is that since neither an explicit detection function for ornamen- tation like vibrato nor estimation of the target note is re- quired, it is robust to sung melodies. In a previous paper (20), we applied this graphical rep- resentation of the F0 contour in the phase plane to a query- by-hamming system and neutralized the local dynamics of the F0 sequence so that only musical information was uti- lized for the query. In contrast, in this study, we use the local dynamics of the F0 sequence for modeling singing styles and disregard the musical information because mu- sical information and singing style are in a dual relation. In this paper, we also evaluate the proposed represen- tation through a singer-class discrimination experiment in which we show that our proposed model can extract the dynamic properties of sung melodies shared by a group of singers. In the next section, we propose stochastic phase plane (SPP) as a stochastic representation of the melodic contour and show how singing ornamentations are modeled by the proposed SPP. In Section 3, we experimentally show the effectiveness of our proposed method through singer class discrimination experiments. Section 4 discusses the ob- tained results and concludes this paper.

[1]  Masuzo Yanagida,et al.  Variability of Vibrato-A Comparative Study between Japanese Traditional Singing and , 2004 .

[2]  I. Nakayama,et al.  Comparative studies on vocal expressions in Japanese traditional and Western classical-style singing using common verse , 2004 .

[3]  Hironori Kitakaze,et al.  Perception of synthesized singing voices with fine fluctuations in their fundamental frequency contours , 2000, INTERSPEECH.

[4]  Geoffroy Peeters,et al.  Singing voice detection in music tracks using direct voice vibrato detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Hideki Kasuya,et al.  F0 Dynamics in Singing: Evidence from the Data of a Baritone Singer , 2004, IEICE Trans. Inf. Syst..

[6]  Christophe d'Alessandro,et al.  The pitch of short‐duration vibrato tones , 1994 .

[7]  S Iwata,et al.  Aerodynamic study of fibrato and voluntary 'straight tone' pairs in singing. , 1971, Folia phoniatrica.

[8]  Haizhou Li,et al.  Exploring Vibrato-Motivated Acoustic Features for Singer Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Masataka Goto,et al.  A real-time filled pause detection system for spontaneous speech recognition , 1999, EUROSPEECH.

[10]  David Gerhard Pitch Track Target Deviation in Natural Singing , 2005, ISMIR.

[11]  Masataka Goto,et al.  Speech-to-Singing Synthesis: Converting Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[12]  Howard B. Rothman,et al.  Acoustic variability in vibrato and its perceptual significance , 1987 .

[13]  E. Thomas Doherty,et al.  Acoustic characteristics of vocal oscillations: Vibrato, exaggerated vibrato, trill, and trillo , 1988 .

[14]  John F. Michel,et al.  Vibrato and pitch transitions , 1987 .

[15]  Masataka Goto,et al.  A Stochastic Representation of the Dynamics of Sung Melody , 2007, ISMIR.

[16]  Masataka Goto,et al.  An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features , 2006, INTERSPEECH.

[17]  Keikichi Hirose,et al.  Prosodic Modeling of Nagauta Singing and Its Evaluation , 2004 .

[18]  S. Iwata,et al.  Aerodynamic Study of Vibrato and Voluntary ‘Straight Tone’ Pairs in Singing , 1971 .