The SP2 SCOPES Project on Speech Prosody

This is an overview of a Joint Research Project within the Scientific co-operation between Eastern Europe and Switzerland (SCOPES) Program of the Swiss National Science Foundation (SNFS) and Swiss Agency for Development and Cooperation (SDC). Within the SP2 SCOPES Project on Speech Prosody, in the course of the following two years, the four partners aim to collaborate on the subject of speech prosody and advance the extraction, processing, modeling and transfer of prosody for a large portfolio of European languages: French, German, Italian, English, Hungarian, Serbian, Croatian, Bosnian, Montenegrin, and Macedonian. Through the intertwined four research plans, synergies are foreseen to emerge that will build a foundation for submitting strong joint proposals for EU funding.

[1]  Petr Motlícek,et al.  A Simple Continuous Pitch Estimation Algorithm , 2013, IEEE Signal Processing Letters.

[2]  Michael Picheny,et al.  The IBM expressive text-to-speech synthesis system for American English , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Tamás Gábor Csapó,et al.  Synthesizing expressive speech from amateur audiobook recordings , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[4]  Darko Pekar Automatic Phonetic Segmentation for a Speech Corpus of Hebrew , 2012 .

[5]  Shrikanth Narayanan,et al.  Detecting prominence in conversational speech: pitch accent, givenness and focus , 2008, Speech Prosody 2008.

[6]  Yang Liu,et al.  Syllable-level prominence detection with acoustic evidence , 2010, INTERSPEECH.

[7]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[8]  Geza Nemeth,et al.  A novel codebook-based excitation model for use in speech synthesis , 2012, 2012 IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom).

[9]  D. Pekar,et al.  Speech Technologies for Serbian and Kindred South Slavic Languages , 2010 .

[10]  Kai Yu,et al.  Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  D. O'Shauqhnessy Modern methods of speech synthesis , 2007, IEEE Circuits and Systems Magazine.

[12]  Nick Campbell Developments in Corpus-Based Speech Synthesis: Approaching Natural Conversational Speech , 2005, IEICE Trans. Inf. Syst..

[13]  György Szaszák,et al.  Using prosody to improve automatic speech recognition , 2010, Speech Commun..

[14]  Svetlana Godjevac Transcribing Serbo-Croatian Intonation* , 2005 .

[15]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[16]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[17]  Shrikanth S. Narayanan,et al.  Expressive speech synthesis using a concatenative synthesizer , 2002, INTERSPEECH.

[18]  Gérard Bailly,et al.  SFC: A trainable prosodic model , 2005, Speech Commun..

[19]  Mark Tatham,et al.  Developments in Speech Synthesis , 2004 .

[20]  Vlado Delic,et al.  Speech and Language Resources within Speech Recognition and Synthesis Systems for Serbian and Kindred South Slavic Languages , 2013, SPECOM.

[21]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[22]  Géza Németh,et al.  Automatic prosody generation - a model for hungarian , 2001, INTERSPEECH.

[23]  Paul Taylor,et al.  Text-to-Speech Synthesis , 2009 .

[24]  Elmar Nöth,et al.  Integrated recognition of words and prosodic phrase boundaries , 2002, Speech Commun..

[25]  Géza Németh,et al.  Profivox—A Hungarian Text-to-Speech System for Telecommunications Applications , 2000, Int. J. Speech Technol..

[26]  Heiga Zen,et al.  Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Santitham Prom-on,et al.  Modeling tone and intonation in Mandarin and English as a process of target approximation. , 2009, The Journal of the Acoustical Society of America.

[28]  Hiroya Fujisaki,et al.  Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing , 1983 .

[29]  Antoine Raux,et al.  A unit selection approach to F0 modeling and its application to emphasis , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[30]  Kathleen Murray,et al.  A Study of Automatic Pitch Tracker Doubling/Halving “Errors” , 2001, SIGDIAL Workshop.

[31]  Zoran Ivanovski,et al.  Prosody Generation Module for Macedonian Text-to-Speech Synthesis , 2011 .

[32]  J. Pierrehumbert,et al.  Synthesizing intonation , 2004 .

[33]  Esther Klabbers,et al.  Synthesis of prosody using multi-level unit sequences , 2005, Speech Commun..

[34]  R. Espesser,et al.  Travaux de l’Institut de Phonétique d’Aix volume 15, pages 75-85 75 AUTOMATIC MODELLING OF FUNDAMENTAL FREQUENCY USING A QUADRATIC SPLINE FUNCTION , 2010 .

[35]  Shrikanth S. Narayanan,et al.  Factored translation models for enriching spoken language translation with prosody , 2008, INTERSPEECH.

[36]  András Beke,et al.  Exploiting Prosody for Automatic Syntactic Phrase Boundary Detection in Speech , 2012 .

[37]  Milan Secujski,et al.  Automatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees , 2011, INTERSPEECH.

[38]  Jan P. H. van Santen,et al.  Integrating phrasing and intonation modelling using syntactic and morphosyntactic information , 2009, Speech Commun..

[39]  Chilin Shih,et al.  Quantitative measurement of prosodic strength in Mandarin , 2003, Speech Commun..

[40]  Jordi Adell,et al.  Prosody Generation for Speech-to-Speech Translation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[41]  A. D. Dominicis,et al.  Intonation Systems: A Survey of Twenty Languages , 1999 .

[42]  Zoran A. Ivanovski,et al.  Analysis of extracted pitch contours across speakers for intonation modelling in TTS synthesis , 2012, 2012 5th International Symposium on Communications, Control and Signal Processing.

[43]  Jakub Adámek Neural Networks Controlling Prosody of Czech Language Department of Software Engineering , 2003 .

[44]  Andrew Rosenberg,et al.  Automatic detection and classification of prosodic events , 2009 .

[45]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .