Modelling Japanese intonation using PENTAtrainer2

This paper presents results from Japanese intonation modelling using PENTAtrainer2, an articulatory synthesiser. Our first aim is to show that PENTA, on which PENTAtrainer2 is based, can achieve high accuracy in predictive synthesis of varying intonation contours. We trained the synthesiser on a 6251-sentence functionally annotated corpus and generated F0 contours for each communicative condition. The accuracy of speaker-dependent and independent synthesis, together with naturalness ratings, show that PENTA is effective in modelling Japanese intonation. This suggests that once contextual variability is incorporated into a model, multi-functional targets alone would suffice as the prosodic representation even in a sizeable corpus.

[1]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[2]  M. H. Quenouille NOTES ON BIAS IN ESTIMATION , 1956 .

[3]  Yi Xu,et al.  Maximum speed of pitch change and how it may relate to speech. , 2002, The Journal of the Acoustical Society of America.

[4]  S. Jun,et al.  Prosodic typology : the phonology of intonation and phrasing , 2014 .

[5]  P. Monoson Speech Science Primer: Physiology, Acoustics, and Perception of Speech (3rd ed.) , 1994 .

[6]  J. Pierrehumbert,et al.  Intonational structure in Japanese and English , 1986, Phonology.

[7]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[8]  Jennifer J. Venditti,et al.  The J_ToBi Model of Japanese Intonation , 2005 .

[9]  Mariko Sugahara Downtrends and Post-FOCUS Intonation in Tokyo Japanese , 2005 .

[10]  Yi Xu,et al.  Revisiting focus prosody in Japanese , 2012 .

[11]  Santitham Prom-on,et al.  Toward invariant functional representations of variable surface fundamental frequency contours: Synthesizing speech melody via model-based stochastic learning , 2014, Speech Commun..

[12]  Yi Xu,et al.  Speech melody as articulatorily implemented communicative functions , 2005, Speech Commun..

[13]  Yi Xu,et al.  Modeling Japanese F0 contours using the PENTAtrainers and AMtrainer , 2014 .

[14]  Yi Xu SPEECH PROSODY : A METHODOLOGICAL REVIEW , 2011 .

[15]  Keikichi Hirose,et al.  Synthesis by rule of voice fundamental frequency contours of spoken Japanese from linguistic information , 1984, ICASSP.

[16]  Emily Q. Wang,et al.  Pitch targets and their realization: Evidence from Mandarin Chinese , 2001, Speech Commun..

[17]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[18]  Santitham Prom-on,et al.  Modeling tone and intonation in Mandarin and English as a process of target approximation. , 2009, The Journal of the Acoustical Society of America.

[19]  J. Pierrehumbert,et al.  Japanese Tone Structure , 1988 .

[20]  Yi Xu,et al.  On the Temporal Domain of Focus , 2004 .

[21]  Lawrence J. Raphael,et al.  Speech Science Primer: Physiology, Acoustics, and Perception of Speech , 1980 .

[22]  Alan C. L. Yu,et al.  Morpheme-like prosodic functions: Evidence from acoustic analysis and computational modeling , 2013 .

[23]  D K Oller,et al.  The role of audition in infant babbling. , 1988, Child development.

[24]  Yi Xu Explaining the PENTA model : A reply to Arvaniti & Ladd ( 2009 ) , 2014 .