A multi-layer F0 model for singing voice synthesis using a b-spline representation with intuitive controls

In singing voice, the fundamental frequency (F0) carries not only melody, but also music style, personal expressivity and other characteristics specific to voice production mechanism. The F0 modeling is therefore critical for a natural-sounding and expressive synthesis. In addition, for artistic purposes, composers also need to have control over expressive parameters of the F0 curve, which is missing in many current approaches. This paper presents a novel parametric F0 model for singing voice synthesis with intuitive control of expressive parameters. The proposed approach considers the various F0 variations of the singing voice as separate layers using B-splines to model the melodic component. This model has been implemented in a concatenative singing voice synthesis system and its perceived naturalness has been evaluated through listening tests. The validity of each layer is first evaluated independently, and the full model is then compared to real F0 curves from professional singers. The results of these tests suggest that the model is suitable to produce natural and expressive F0 contours.

[1]  Cham Athwal,et al.  Towards a Model for the Humanisation of Pitch Drift in Singing Voice Synthesis , 2011, ICMC.

[2]  Yoshihiko Nankaku,et al.  Integration of speaker and pitch adaptive training for HMM-based singing voice synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  J. Beauchamp,et al.  An investigation of vocal vibrato for synthesis , 1990 .

[4]  Makoto Tachibana,et al.  A singing style modeling system for singing voice synthesizers , 2010, INTERSPEECH.

[5]  Anders Friberg,et al.  Towards a rule-based model for violin vibrato , 2001 .

[6]  Jordi Bonada,et al.  Generating Singing Voice Expression Contours Based on Unit Selection , 2013 .

[7]  Masashi Unoki,et al.  Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis , 2005, Speech Commun..

[8]  Axel Röbel A SHAPE-INVARIANT PHASE VOCODER FOR SPEECH TRANSFORMATION , 2010 .

[9]  Alex Loscos,et al.  Sample-based singing voice synthesizer by spectral concatenation , 2003 .

[10]  Jordi Bonada,et al.  Modeling Musical Artculation Gestures in Singing Voice Performances , 2006 .

[11]  Katsutoshi Itoyama,et al.  Transcribing vocal expression from polyphonic music , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Olivier Boëffard,et al.  Melodic contour estimation with B-spline models using a MDL criterion , 2006 .

[13]  X. Rodet,et al.  Sound Analysis and Processing with AudioSculpt 2 , 2004, ICMC.

[14]  Yoshihiko Nankaku,et al.  HMM-Based singing voice synthesis and its application to Japanese and English , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Shinsuke Sakai,et al.  Additive modeling of English F0 contour for speech synthesis , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Jordi Bonada,et al.  Voice Processing and synthesis by performance sampling and spectral models , 2009 .

[17]  Hideki Kenmochi,et al.  VOCALOID - commercial singing synthesizer based on sample concatenation , 2007, INTERSPEECH.

[18]  Mark A. Clements,et al.  Concatenation-Based MIDI-to-Singing Voice Synthesis , 1997 .

[19]  Haizhou Li,et al.  Generalized F0 modelling with absolute and relative pitch features for singing voice synthesis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Jordi Bonada,et al.  The Singing Tutor: Expression Categorization and Segmentation of the Singing Voice , 2006 .

[21]  Axel Röbel,et al.  Phase vocoder and beyond , 2013 .

[22]  E. Prame Measurements of the vibrato rate of ten singers , 1994 .

[23]  J. Sundberg,et al.  Measurements of vibrato parameters in long sustained crescendo notes as sung by ten sopranos. , 2003, Journal of voice : official journal of the Voice Foundation.