An improved pitch contour formulation for Malay language storytelling Text-to-Speech (TTS)

In this paper, an improved pitch contour formulation is introduced by modifying the existing pitch contour sinusoidal function. The aim is to convert neutral speech into storytelling speech in Malay Language. Our speech datasets (neutral and storytelling speech) were recorded by a male and a female professional speaker. They contain 116 speech sentences, 1,164 words, and 2,732 syllables. For storytelling speech, 124 prominent syllables are detected using Prosogram tool. These prominent syllables are further categorized into six clusters of pitch contour. Distance measurements using one minus Pearson correlation is done to assess the similarity of the proposed pitch contour formulae to the original storytelling pitch contour. The proposed pitch contour sinusoidal function is also compared with the existing pitch contour function used by previous work. The results showed that the proposed pitch contour formulation performed better than the existing pitch contour formulae.

[1]  K. Sreenivasa Rao,et al.  Designing prosody rule-set for converting neutral TTS speech to storytelling style speech for Indian languages: Bengali, Hindi and Telugu , 2014, 2014 Seventh International Conference on Contemporary Computing (IC3).

[2]  P. Gokul,et al.  Synthesizing intonation for Malayalam TTS , 2015, 2015 International Conference on Control Communication & Computing India (ICCC).

[3]  Nursuriati Jamil,et al.  Synthesizing Asli Malay Song: Transforming Spoken Voices into Singing Voices , 2014 .

[4]  K. Sreenivasa Rao,et al.  Neutral to happy emotion conversion by blending prosody and laughter , 2015, 2015 Eighth International Conference on Contemporary Computing (IC3).

[5]  Anne-Catherine Simon,et al.  A Model for Varying Speaking Style in TTS systems , 2010 .

[6]  K. Hirose,et al.  Analysis and synthesis of F0 contours of declarative, interrogative, and imperative utterances of Bangla , 2012, 2012 International Conference on Speech Database and Assessments.

[7]  Abdul Rashid Mohamed,et al.  A Comparative Analysis of Word Structures in Malay and English Children's Stories , 2013 .

[8]  Dirk Heylen,et al.  Generating expressive speech for storytelling applications , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Zoran A. Ivanovski,et al.  Analysis of extracted pitch contours across speakers for intonation modelling in TTS synthesis , 2012, 2012 5th International Symposium on Communications, Control and Signal Processing.

[10]  Pallavi S. Deshpande,et al.  Pitch contour modelling and modification for expressive Marathi speech synthesis , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[11]  Paul Y. Chan,et al.  Template-based personalized singing voice synthesis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  K. Sreenivasa Rao,et al.  Conversion of neutral speech to storytelling style speech , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).

[13]  S. D. Shirbahadurkar,et al.  Pitch and duration modification for expressive speech synthesis in Marathi TTS system , 2015, 2015 International Conference on Pervasive Computing (ICPC).

[14]  Izzad Ramli,et al.  An Improved Syllabification for a Better Malay Language Text-to-Speech Synthesis (TTS) , 2015 .

[15]  Esther Klabbers,et al.  Clustering of foot-based pitch contours in expressive speech , 2004, SSW.