A Weighted Superposition of Functional Contours Model for Modelling Contextual Prominence of Elementary Prosodic Contours

The way speech prosody encodes linguistic, paralinguistic and non-linguistic information via multiparametric representations of the speech signals is still an open issue. The Superposi-tion of Functional Contours (SFC) model proposes to decompose prosody into elementary multiparametric functional contours through the iterative training of neural network contour generators using analysis-by-synthesis. Each generator is responsible for computing multiparametric contours that encode one given linguistic, paralinguistic and non-linguistic information on a variable scope of rhythmic units. The contributions of all generators' outputs are then overlapped and added to produce the prosody of the utterance. We propose an extension of the contour generators that allows them to model the prominence of the elementary contours based on contextual information. WSFC jointly learns the patterns of the elementary multi-parametric functional contours and their weights dependent on the contours' contexts. The experimental results show that the proposed weighted SFC (WSFC) model can successfully capture contour prominence and thus improve SFC modelling performance. The WSFC is also shown to be effective at modelling the impact of attitudes on the prominence of functional contours cuing syntactic relations in French, and that of emphasis on the prominence of tone contours in Chinese.

[1]  Gérard Bailly,et al.  PySFC - A System for Prosody Analysis based on the Superposition of Functional Contours Prosody Model , 2018 .

[2]  Gérard Bailly,et al.  SFC: A trainable prosodic model , 2005, Speech Commun..

[3]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[4]  Gérard Bailly,et al.  A superposed prosodic model for Chinese text-to-speech synthesis , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[5]  Gérard Bailly,et al.  Generation of intonation: a global approach , 1995, EUROSPEECH.

[6]  Fang Liu,et al.  Parallel Encoding of Focus and Interrogative Meaning in Mandarin Intonation , 2005, Phonetica.

[7]  Gérard Bailly,et al.  The significance of scope in modelling tones in Chinese , 2018 .

[8]  G. Bailly,et al.  LEARNING THE HIDDEN STRUCTURE OF SPEECH: FROM COMMUNICATIVE FUNCTIONS TO PROSODY , 2011 .

[9]  Gérard Bailly,et al.  Generating prosodic attitudes in French: Data, model and evaluation , 2001, Speech Commun..

[10]  Gérard Bailly,et al.  Learning the Hidden Structure of Intonation: Implementing Various Functions of Prosody , 2002 .

[11]  Gérard Bailly,et al.  Evaluating the adequacy of synthetic prosody in signaling syntactic boundaries : methodology and first results , 1998 .

[12]  Yann Morlec Génération multiparamétrique de la prosodie du français par apprentissage automatique , 1997 .

[13]  Gérard Bailly,et al.  Generating prosody by superposing multi-parametric overlapping contours , 2000, INTERSPEECH.

[14]  Yi Xu Contextual tonal variations in Mandarin , 1997 .