Speech synthesis

utterance plan physical optimised perception high-level low-level acoustic signal [speech judged cognitive rendering to be natural] processes processes internal perceptual external environmental trialling environmental factors within factors the speaker In the diagram we see that high-level cognitive processes provide an utterance plan for low-level rendering. Part of the rendering process involves perceptually trialling a hypothesised acoustic signal: this is done within the human speaker’s mind. Adjustments are made to rendering in an iterative fashion so that the final acoustic signal can be optimised for perception. The speaker, by pre-trialling the ‘sound’ has ensured that it is appropriate for perception. Or, the synthesiser has ensured that the signal will be judged natural (at least from this point of view).

[1]  Shu Lin,et al.  Error control coding : fundamentals and applications , 1983 .

[2]  Nick Campbell,et al.  Optimising selection of units from speech databases for concatenative synthesis , 1995, EUROSPEECH.

[3]  S. Ono,et al.  A decision feedback equalizer with a frequency offset compensating circuit for digital cellular radio , 1992, [1992 Proceedings] Vehicular Technology Society 42nd VTS Conference - Frontiers of Technology.

[4]  R. W. Donaldson,et al.  Real-time implementation and evaluation of an adaptive silence deletion algorithm for speech compression , 1991, [1991] IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference Proceedings.

[5]  Daniel P. W. Ellis,et al.  The auditory organization of speech and other sources in listeners and computational models , 2001, Speech Commun..

[6]  Joan C. Borod,et al.  Cerebral mechanisms underlying facial, prosodic, and lexical emotional expression: A review of neuropsychological studies and methodological issues. , 1993 .

[7]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[8]  Eric Lewis,et al.  Syllable reconstruction in concatenated waveform speech synthesis , 1999 .

[9]  Stephen Young Probabilistic methods in spoken–dialogue systems , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[10]  J. Panksepp Affective Neuroscience: The Foundations of Human and Animal Emotions , 1998 .

[11]  Björn Lindblom,et al.  Explaining Phonetic Variation: A Sketch of the H&H Theory , 1990 .

[12]  Thomas Baer,et al.  An articulatory synthesizer for perceptual research , 1978 .

[13]  A. Simola,et al.  A gateway between a EUROCOM D/1-network and a private PTT-type CCITT SS7-network , 1992, MILCOM 92 Conference Record.

[14]  Zhang Wei,et al.  Real-time formant speech synthesis using the TMS320C25 , 1991, China., 1991 International Conference on Circuits and Systems.

[15]  Alan W. Black,et al.  Synthesizing conversational intonation from a linguistically rich input , 1994, SSW.

[16]  Marc Swerts,et al.  Isca Archive , 1999 .

[17]  Daniel Jones The phoneme: its nature and use , 1952 .

[18]  J. Rothweiler,et al.  Noise-robust 1200-bps voice coding , 1992, [1992] Proceedings of the Tactical Communications Conference.

[19]  Ron Artstein,et al.  Focus Below the Word Level , 2004 .

[20]  Marcel A.A. Tatham,et al.  Speech synthesis—A critical review of the state of the art , 1970 .

[21]  K. N. Stevens Toward formant synthesis with articulatory controls , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[22]  Enzo Mumolo,et al.  An efficient algorithm for real-time voiced/unvoiced decision , 1991, EUROSPEECH.

[23]  Chetan Sharma,et al.  Voice XML : strategies and techniques for effective voice application development with voice XML 2.0 , 2002 .

[24]  J. L. So Implementation of an NIC (nearly instantaneous companding) 32 kbps transcoder using the TMS320C25 digital signal processor , 1988, IEEE Global Telecommunications Conference and Exhibition. Communications for the Information Age.

[25]  Unto K. Laine,et al.  A model for real-time sound synthesis of guitar on a floating-point signal processor , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[26]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[27]  Jean Véronis,et al.  Automatic Stylisation and Modelling of French and Italian Intonation , 2000 .

[28]  G R Rajugopal,et al.  Multichannel All-Digital PCM-ADM Transcoder , 1992 .

[29]  Robin Milner,et al.  Communication and concurrency , 1989, PHI Series in computer science.

[30]  Peter Jung,et al.  VLSI implementation of soft output Viterbi equalizers for mobile radio applications , 1992, [1992 Proceedings] Vehicular Technology Society 42nd VTS Conference - Frontiers of Technology.

[31]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[32]  Steve J. Young,et al.  Talking to machines (statistically speaking) , 2002, INTERSPEECH.

[33]  L. F. Barrett,et al.  Handbook of Emotions , 1993 .

[34]  S. H. Leung,et al.  A novel pulse-excitation using coded locations for linear predictive speech coding , 1991 .

[35]  R. Prudon,et al.  Prosody synthesis by unit selection and transplantation on diphones , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[36]  Mark Tatham,et al.  Supervision of speech production , 1995 .

[37]  Eric Lewis,et al.  Prosodic Assignment in SPRUCE Text-to- Speech Synthesis , 1992 .

[38]  K. Morton,et al.  Electromyographic and Intraoral Air-Pressure Studies of Bi-Labial Stops , 1973, Language and speech.

[39]  E. Zigouris,et al.  Autocorrelation-based pitch determination algorithms for realtime vocoders with the TMS32020/C25 , 1990, Microprocess. Microsystems.

[40]  Christof Traber Syntactic processing and prosody control in the SVOX TTS system for German , 1993, EUROSPEECH.

[41]  Richard Alterman,et al.  Autonomous Agents that Learn to Better Coordinate , 2004, Autonomous Agents and Multi-Agent Systems.

[42]  Klaus R. Scherer,et al.  Adding the affective dimension: a new look in speech analysis and synthesis , 1996, ICSLP.

[43]  D. Talkin Fundamentals of Speech Synthesis and Speech Recognition , 1996 .

[44]  Zishan Liu,et al.  Implementation of modified regular-pulse excited linear predictive codec on TMS320C25 , 1991, [1991 Proceedings] 41st IEEE Vehicular Technology Conference.

[45]  A.W. Black,et al.  Unit selection without a phoneme set , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[46]  F. Kitson,et al.  A real-time ADPCM encoder using variable order prediction , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47]  Zunning Liu,et al.  Realtime implementation algorithm of CELP at 4.8 kb/s , 1991, ICC 91 International Conference on Communications Conference Record.

[48]  M. Posner The Brain and Emotion , 1999, Nature Medicine.

[49]  Michael Gasser,et al.  A Dynamic Approach to Rhythm in Language: Toward a Temporal Phonology , 1995, ArXiv.

[50]  B. Hayes Metrical Stress Theory: Principles and Case Studies , 1995 .

[51]  Daniel M. Johnson,et al.  Experience as a moderator of the media equation: the impact of flattery and praise , 2004, Int. J. Hum. Comput. Stud..

[52]  M. A. A. Tatham Towards a cognitive phonetics , 1984 .

[53]  K. Scherer Neuroscience projections to current debates in emotion psychology , 1993 .

[54]  J. van Santen,et al.  Prosodic factors for predicting local pitch shape , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[55]  Katherine Morton,et al.  Some Electromyography Data Towards a Model of Speech Production , 1969 .

[56]  M. Tatham Articulatory Speech Synthesis by Rule : Implementation of a Theory of Speech Production , 2005 .

[57]  Louis Goldstein,et al.  Towards an articulatory phonology , 1986, Phonology.

[58]  E. V. Jones,et al.  Adaptive coding for conversational speech communication , 1989 .

[59]  Mark Tatham,et al.  Data structures in speech production , 2003, Journal of the International Phonetic Association.

[60]  Paul Taylor,et al.  The rise/fall/connection model of intonation , 1994, Speech Communication.

[61]  R. Zajonc Feeling and thinking : Preferences need no inferences , 1980 .

[62]  J. T. Hart,et al.  Integrating different levels of intonation analysis , 1975 .

[63]  Keikichi Hirose,et al.  Prosodic focus control in reply speech generation for a spoken dialogue system of information retrieval , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[64]  M A Tatham Classifying Allophones , 1971, Language and speech.

[65]  J. A. Marks,et al.  Real time speech classification and pitch detection , 1988, COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings.

[66]  Andrew Ortony,et al.  The Cognitive Structure of Emotions , 1988 .

[67]  Richard M. Crowder,et al.  Flexible XML‐based configuration of physical simulations , 2004, Softw. Pract. Exp..

[68]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[69]  Nico H. Frijda,et al.  The psychologists' point of view , 2008 .

[70]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[71]  Kenneth N. Stevens,et al.  Constraints among param-eters simplify control of Klatt formant synthesizer , 1991 .

[72]  W. S. Wang,et al.  Intrinsic cues and consonant perception. , 1961, Journal of speech and hearing research.

[73]  R. E. Stone Speech processing using the TS32010 - a case study , 1990 .

[74]  Martin T. Reilly A Hybridized Linear Prediction Code Speech Synthesizer , 1986, MILCOM 1986 - IEEE Military Communications Conference: Communications-Computers: Teamed for the 90's.

[75]  K. Munhall,et al.  Coarticulation: Theory, Data, and Techniques , 2001 .

[76]  A. Ortony,et al.  Cognition in emotion: Always, sometimes, or never , 2000 .

[77]  James Paul Gee,et al.  Performance structures: A psycholinguistic and linguistic appraisal , 1983, Cognitive Psychology.

[78]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[79]  K. Morton,et al.  Expression in Speech: Analysis and Synthesis , 2003 .

[80]  J. V. Macres Real-time implementations and applications of the US Federal Standard CELP voice coding algorithm , 1992, [1992] Proceedings of the Tactical Communications Conference.

[81]  Eric Lewis,et al.  A new intonation model for text-to-speech synthesis , 1999 .

[82]  Michael Luck,et al.  A Manifesto for Agent Technology: Towards Next Generation Computing , 2004, Autonomous Agents and Multi-Agent Systems.