Speech synthesis systems: disadvantages and limitations

The present speech synthesis systems can be successfully used for a wide range of diverse purposes. However, there are serious and important limitations in using various synthesizers. Many of these problems can be identified and resolved. The aim of this paper is to present the current state of development of speech synthesis systems and to examine their drawbacks and limitations. The paper dis-cusses the current classification, construction and functioning of speech synthesis systems, which gives an insight into synthesizers implemented so far. The analysis of disadvantages and limitations of speech synthesis systems focuses on identification of weak points of these systems, namely: the impact of emotions and prosody, spontaneous speech in terms of naturalness and intelligibility, preprocessing and text analysis, problem of ambiguity, natural sounding, adaptation to the situation, variety of systems, sparsely spoken languages, speech synthesis for older people, and some other minor limitations. Solving these problems stimulates further development of speech synthesis domain.

[1]  S. Schweber,et al.  Instruments and the imagination , 1998, Nature.

[2]  Simon King,et al.  Multilingual number transcription for text-to-speech conversion , 2013, SSW.

[3]  Text-to-Speech (TTS) Synthesis , 2009, Encyclopedia of Biometrics.

[5]  Chia-Ping Chen,et al.  A hidden Markov model-based approach for emotional speech synthesis , 2010, SSW.

[6]  Marc Schröder,et al.  Expressive Speech Synthesis: Past, Present, and Possible Futures , 2009, Affective Information Processing.

[7]  Kimmo Pärssinen Multilingual Text-to-Speech System for Mobile Devices: Development and Applications , 2007 .

[8]  Kallirroi Georgila,et al.  Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selection , 2010 .

[9]  Maria Klara Wolters,et al.  Making speech synthesis more accessible to older people , 2007, SSW.

[10]  Krishna Bikram Shah,et al.  Current Trends , Frameworks and Techniques Used in Speech Synthesis – A Survey , 2012 .

[11]  Junichi Yamagishi,et al.  Combining Statistical Parameteric Speech Synthesis and Unit-Selection for Automatic Voice Cloning , 2008 .

[12]  Jerome R. Bellegarda,et al.  Toward naturally expressive speech synthesis: data - driven emotion detection using latent affective analysis , 2010, SSW.

[13]  Kallirroi Georgila,et al.  Automatic detection of unnatural word-level segments in unit-selection speech synthesis , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[14]  Agustín Gravano,et al.  Improving speech synthesis quality by reducing pitch peaks in the source recordings , 2013, NAACL.

[15]  Sebastian Möller,et al.  Is intelligibility still the main problem? a review of perceptual quality dimensions of synthetic speech , 2013, SSW.

[16]  A. Akila,et al.  An Overview of Speech Recognition and Speech Synthesis Algorithms , 2012 .

[17]  Matthew P. Aylett,et al.  Expressive speech synthesis: synthesising ambiguity , 2013, SSW.

[18]  Simon King,et al.  Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields , 2011, IEEE Signal Processing Letters.

[19]  Rainer Gruhn,et al.  Statistical pronunciation modeling for non-native speech processing , 2011 .

[20]  Blaise Potard,et al.  Understanding factors in emotion perception , 2013, SSW.

[21]  Amita Dev,et al.  Speech Synthesis: A Review , 2013 .

[22]  A. Indumathi Survey On Speech Synthesis , 2013 .

[23]  William Work Developments in Speech , 1967 .

[24]  S. Raptis,et al.  The ILSP Text-to-Speech System for the Blizzard Challenge 2012 , 2010 .

[25]  Mark Tatham,et al.  Developments in Speech Synthesis , 2004 .

[26]  Alan W. Black,et al.  Text to speech in new languages without a standardized orthography , 2013, SSW.

[27]  Kishore Prahallad,et al.  Text processing for text-to-speech systems in Indian languages , 2007, SSW.

[28]  Ms. Rupinderdeep Kaur,et al.  A Step towards Making an Effective Text to speech Conversion System , 2012 .

[29]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[30]  Hans Uszkoreit,et al.  The Polish Language in the Digital Age , 2012 .

[31]  Manfred R. Schroeder,et al.  A brief history of synthetic speech , 1993, Speech Commun..

[32]  Thomas S. Huang,et al.  Two-stage prosody prediction for emotional text-to-speech synthesis , 2008, INTERSPEECH.

[33]  J. Ohala Christian Gottlieb Kratzenstein: Pioneer in Speech Synthesis , 2011, ICPhS.

[34]  Heiga Zen,et al.  Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Anne Lacheret,et al.  Towards Improved HMM-based Speech Synthesis Using High-Level Syntactical Features. , 2009 .

[36]  Joseph P. Olive,et al.  Text-to-speech synthesis , 1995, AT&T Technical Journal.