Evaluating the Quality of Synthetic Speech

The utility of text-to-speech systems depends on two aspects of their performance: intelligibility and nateralness. Evaluation of these aspects of synthetic speech provides important information about the performance of a speech synthesizer in comparison to competing products. It can be important for developers to understand where the realitve strengths and weaknesses of a particular synthesizer are so they can assist in development as well as marketing. Diagnostic evaluation directly can assist in the development effort by pinpointing specific problems in synthesis that can be redressed by engineering solutions. This chapter will outline the prinicles that govern the design of tests to measurethe performance of a text-to-speech system. Moreover, we will discuss the factors that influence performance on these tests. Finally, we will review some performance comparison tests used for text-to speech systems.

[1]  L. Lisker,et al.  Letter: Is it VOT or a first-formant transition detector? , 1975, The Journal of the Acoustical Society of America.

[2]  A W Huggins,et al.  Speech quality evaluation using "phoneme-specific" sentences. , 1985, The Journal of the Acoustical Society of America.

[3]  J. Laver The phonetic description of voice quality , 1980 .

[4]  Alexander L. Francis,et al.  Effects of training on attention to acoustic cues , 2000, Perception & psychophysics.

[5]  Murray F. Spiegel,et al.  Comprehensive assessment of the telephone intelligibility of synthesized and natural speech , 1990, Speech Commun..

[6]  A. Syrdal,et al.  Applied speech technology , 1995 .

[7]  D B Pisoni,et al.  Segmental intelligibility of synthetic speech produced by rule. , 1989, The Journal of the Acoustical Society of America.

[8]  P. Denes Effect of Duration on the Perception of Voicing , 1955 .

[9]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[10]  Eileen C. Schwab,et al.  CHAPTER 4 – The Role of Attention and Active Processing in Speech Perception* , 1986 .

[11]  G. A. Miller,et al.  Some perceptual consequences of linguistic rules , 1963 .

[12]  John Morton,et al.  Psycholinguistics 2: Structures and Processes , 1980 .

[13]  G. Fairbanks Test of Phonemic Differentiation: The Rhyme Test , 1958 .

[14]  T D Carrell,et al.  Onset spectra and formant transitions in the adult's and child's perception of place of articulation in stop consonants. , 1983, The Journal of the Acoustical Society of America.

[15]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[16]  W. D. Voiers,et al.  Diagnostic Evaluation of Speech Intelligibility , 1977 .

[17]  B. Repp Phonetic trading relations and context effects : new experimental evidence for a speech mode of perception , 1982 .

[18]  Ann K. Syrdal,et al.  An evaluation of the diagnostic rhyme test , 1998, Int. J. Speech Technol..

[19]  W. Ganong Phonetic categorization in auditory word perception. , 1980, Journal of experimental psychology. Human perception and performance.

[20]  A. Liberman,et al.  Some Experiments on the Perception of Synthetic Speech Sounds , 1952 .

[21]  Gunnar Fant,et al.  What can basic research contribute to speech synthesis , 1991 .

[22]  E C Schwab,et al.  Some Effects of Training on the Perception of Synthetic Speech , 1985, Human factors.

[23]  Alexander L. Francis,et al.  The effect of lexical complexity on segmental intelligibility. , 1996 .

[24]  Astrid Schmidt Nielsen Problems in evaluating the real-world usability of digital voice communication systems , 1985 .

[25]  David B. Pisoni,et al.  Speech perception, word recognition and the structure of the lexicon , 1985, Speech Commun..

[26]  Astrid Schmidt-Nielsen,et al.  Intelligibility and Acceptability Testing for Speech Technology , 1992 .

[27]  R. Port,et al.  Consonant/vowel ratio as a cue for voicing in English , 1982, Perception & psychophysics.

[28]  Rolf Carlson,et al.  MITalk‐79: The 1979 MIT text‐to‐speech system , 1979 .

[29]  Michael Garman,et al.  Psycholinguistics: Accessing the mental lexicon , 1990 .

[30]  Alexander L. Francis,et al.  Measuring the naturalness of synthetic speech , 1995, Int. J. Speech Technol..

[31]  Lynne E. Bernstein,et al.  The Vocally Impaired: Clinical Practice and Research , 1988 .

[32]  M. Halle,et al.  Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates , 1961 .

[33]  J. P. Egan Articulation testing methods , 1948, The Laryngoscope.

[34]  M. Studdert-Kennedy,et al.  Stop-consonant recognition: Release bursts and formant transitions as functionally equivalent, context-dependent cues , 1977 .

[35]  K. D. Kryter,et al.  ARTICULATION-TESTING METHODS: CONSONANTAL DIFFERENTIATION WITH A CLOSED-RESPONSE SET. , 1965, The Journal of the Acoustical Society of America.

[36]  David B Pisoni,et al.  Constraints on the perception of synthetic speech generated by rule , 1985, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[37]  G. Logan,et al.  On the Use of a Concurrent Memory Load to Measure Attention and Automaticity , 1979 .

[38]  Joseph P. Olive,et al.  Acoustics of American English speech , 1993 .

[39]  S. Blumstein,et al.  Phonetic features and acoustic invariance in speech , 1981, Cognition.

[40]  Alexander L. Francis,et al.  The Effect of Lexical Complexity on Intelligibility , 1999, Int. J. Speech Technol..

[41]  A M Liberman,et al.  Perceptual equivalence of two acoustic cues for stop-consonant manner , 1980, Perception & psychophysics.

[42]  A. Liberman,et al.  Acoustic Loci and Transitional Cues for Consonants , 1954 .

[43]  A M Liberman,et al.  Perception of the speech code. , 1967, Psychological review.

[44]  Ann K. Syrdal Improved duration rules for text‐to‐speech synthesis , 1989 .

[45]  Pickett The Sounds of Speech Communication , 1980 .

[46]  ed. Mones E. Hawley Speech intelligibility and speaker recognition , 1977 .

[47]  T. Feustel,et al.  Capacity Demands in Short-Term Memory for Synthetic and .Natural Speech , 1983, Human factors.

[48]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.