Quality of speech produced by analysis-synthesis

Abstract We review factors that have affected the synthesis of high-quality speech by analysis-synthesis. The influence of a selected subset of these factors on the quality of synthesized speech was evaluated through listener preference judgements by comparing natural speech to the synthetic speech of two synthesizers: linear prediction coding (LPC) and formant. Several synthesizer excitation waveforms were considered. These waveforms included critical parameters that replicated selected glottal timing events, e.g., the instants of glottal closure and glottal opening. In addition, identifying voiced/unvoiced/mixed excitation and silent intervals in the speech waveform and measuring the fundamental frequency of voicing contributed to the synthesis of high-quality speech. A two-channel approach to speech analysis is recommended to aid the automatic processing of speech, where one channel is the conventional acoustic signal, while the other channel is the electroglottogram (EGG).

[1]  D. Childers,et al.  A critical review of electroglottography. , 1985, Critical reviews in biomedical engineering.

[2]  Karl D. Kryter Chapter 2 – Masking and Speech Communication in Noise , 1970 .

[3]  Matti Karjalainen,et al.  Use of computational psychoacoustical models in speech processing: Coding and objective performance evaluation , 1984, ICASSP.

[4]  M. B. Rosson,et al.  Designing a quality voice: an analysis of listeners' reactions to synthetic voices , 1986, CHI '86.

[5]  J. N. Holmes,et al.  Formant synthesizers: Cascade or parallel? , 1983, Speech Commun..

[6]  T. Koizumi,et al.  Glottal source-vocal tract interaction. , 1985, The Journal of the Acoustical Society of America.

[7]  W. Voiers,et al.  The present state of digital vocoding technique: A diagnostic evaluation , 1968 .

[8]  Jayant Murari Naik Synthesis and evaluation of natural sounding speech using the linear predictive analysis-synthesis scheme , 1984 .

[9]  B.-H. Juang,et al.  On using the Itakura-Saito measures for speech coder performance evaluation , 1984, AT&T Bell Laboratories Technical Journal.

[10]  P. Lieberman Perturbations in Vocal Pitch , 1960 .

[11]  Molly Mack,et al.  The intelligiblity of nonvocoded and vocoded semantically anomalous sentences , 1985 .

[12]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[13]  E C Schwab,et al.  Some Effects of Training on the Perception of Synthetic Speech , 1985, Human factors.

[14]  V. Viswanathan,et al.  Objective Speech Quality Evaluation of Real-Time Speech Coders. , 1984 .

[15]  B. Atal,et al.  Speech analysis and synthesis by linear prediction of the speech wave. , 1971, The Journal of the Acoustical Society of America.

[16]  Kim E. A. Silverman,et al.  Evidence for the independent function of intonation contour type, voice quality, and F0 range in signaling speaker affect , 1985 .

[17]  Hisao Kuwabara A pitch-synchronous analysis/synthesis system to independently modify formant frequencies and bandwidths for voiced speech , 1984, Speech Commun..

[18]  S. Seneff System to independently modify excitation and/Or spectrum of speech waveform without explicit pitch extraction , 1982 .

[19]  Donald G. Childers,et al.  Electroglottography for Laryngeal Function Assessment and Speech Analysis , 1984, IEEE Transactions on Biomedical Engineering.

[20]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[21]  K. D. Kryter,et al.  ARTICULATION-TESTING METHODS: CONSONANTAL DIFFERENTIATION WITH A CLOSED-RESPONSE SET. , 1965, The Journal of the Acoustical Society of America.

[22]  Bishnu S. Atal,et al.  A new model of LPC excitation for producing natural-sounding speech at low bit rates , 1982, ICASSP.

[23]  John Makhoul,et al.  A framework for the objective evaluation of vocoder speech quality , 1976, ICASSP.

[24]  J Reichle,et al.  The intelligibility of synthesized speech: ECHO II versus VOTRAX. , 1987, Journal of speech and hearing research.

[25]  Ke Wu,et al.  Factors in voice quality: Acoustic features related to gender , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  W. D. Voiers,et al.  Diagnostic Evaluation of Speech Intelligibility , 1977 .

[27]  A W Huggins,et al.  Speech quality evaluation using "phoneme-specific" sentences. , 1985, The Journal of the Acoustical Society of America.

[28]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[29]  M Nakatsui,et al.  Subjective speech-to-noise ratio as a measure of speech quality for digital waveform coders. , 1982, The Journal of the Acoustical Society of America.

[30]  Hiroya Fujisaki,et al.  Proposal and evaluation of models for the glottal source waveform , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Jae S. Lim,et al.  Speech enhancement , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Wolfgang Hess,et al.  Accurate pitch determination of speech signals by means of a laryngograph , 1984, ICASSP.

[33]  T. V. Ananthapadmanabha,et al.  Calculation of true glottal flow and its components , 1982, Speech Commun..

[34]  W. Strong,et al.  A model for the synthesis of natural sounding vowels , 1983 .

[35]  Donald G. Childers,et al.  Formant speech synthesis: improving production quality , 1989, IEEE Trans. Acoust. Speech Signal Process..

[36]  G. Fairbanks Test of Phonemic Differentiation: The Rhyme Test , 1958 .

[37]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[38]  A. Gray,et al.  Distortion performance of vector quantization for LPC voice coding , 1982 .

[39]  H. H. Clark The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. , 1973 .

[40]  M. Hirano,et al.  Glottal-area time function and subglottal-pressure variation. , 1973, The Journal of the Acoustical Society of America.

[41]  P. Milenkovic,et al.  Least mean square measures of voice perturbation. , 1987, Journal of speech and hearing research.

[42]  D.B. Pisoni,et al.  Perception of synthetic speech generated by rule , 1985, Proceedings of the IEEE.

[43]  Aaron E. Rosenberg,et al.  On reducing the buzz in LPC synthesis , 1978 .

[44]  I R Titze,et al.  Some technical considerations in voice perturbation measurements. , 1987, Journal of speech and hearing research.

[45]  P Kitzing,et al.  A photoglottographical study of the female vocal folds during phonation. , 1974, Folia phoniatrica.

[46]  T. Feustel,et al.  Capacity Demands in Short-Term Memory for Synthetic and .Natural Speech , 1983, Human factors.

[47]  B. Yegnanarayana,et al.  Measuring source-tract interaction from speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[48]  L. Boves,et al.  On subglottal formant analysis. , 1987, The Journal of the Acoustical Society of America.

[49]  William J. Strong,et al.  Intelligibility and quality of linear predictor and eigenparameter coded speech , 1981 .

[50]  T Murry,et al.  Multidimensional analysis of male and female voices. , 1980, The Journal of the Acoustical Society of America.

[51]  D B Pisoni,et al.  Segmental intelligibility of synthetic speech produced by rule. , 1989, The Journal of the Acoustical Society of America.

[52]  Bishnu S. Atal,et al.  Stochastic coding of speech signals at very low bit rates: The importance of speech perception , 1985, Speech Commun..

[53]  Donald G. Childers,et al.  Glottal sensing for speech analysis and synthesis , 1983, ICASSP.

[54]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[55]  J. Markel,et al.  An intelligibility evaluation of several linear prediction vocoder modifications , 1977 .

[56]  Bishnu S. Atal,et al.  On synthesizing natural-sounding speech by linear prediction , 1979, ICASSP.

[57]  J A Waterworth,et al.  Effect of intonation form and pause durations of automatic telephone number announcements on subjective preference and memory performance. , 1983, Applied ergonomics.

[58]  Sallyanne Palethorpe,et al.  Cue enhancement by stimulus repetition: Natural and synthetic speech comparisons , 1985 .

[59]  Ian B. Thomas,et al.  The Influence of First and Second Formants on the Intelligibility of Clipped Speech , 1968 .

[60]  S. G. Nooteboom The temporal organization of speech and the process of spoken-word recognition , 1983 .

[61]  T Murry,et al.  Multidimensional classification of normal voice qualities. , 1977, The Journal of the Acoustical Society of America.

[62]  David B. Pisoni,et al.  Perceptual evaluation of MITalk: The MIT unrestricted text-to-speech system , 1980, ICASSP.

[63]  Peter No,et al.  Digital Coding of Waveforms , 1986 .

[64]  G. P. Moore,et al.  A model for vocal fold vibratory motion, contact area, and the electroglottogram. , 1986, The Journal of the Acoustical Society of America.

[65]  Harvey Fletcher,et al.  Articulation testing methods , 1929 .

[66]  Ernst H. Rothauser,et al.  A Comparison of Preference Measurement Methods , 1971 .

[67]  M. Nadal-Suris,et al.  Comparison of natural speech with glottal area waveform synthetic speech , 1977 .

[68]  J Hillenbrand,et al.  A methodological study of perturbation and additive noise in synthetically generated voice signals. , 1987, Journal of speech and hearing research.

[69]  J. Holmes,et al.  The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer , 1973 .

[70]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[71]  R. B. Monsen,et al.  Study of variations in the male and female glottal wave. , 1976, The Journal of the Acoustical Society of America.

[72]  D. Childers,et al.  Two-channel speech analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..

[73]  P. Lieberman Some Acoustic Measures of the Fundamental Periodicity of Normal and Pathologic Larynges , 1963 .

[74]  E H Rothauser,et al.  Isopreference method for speech evaluation. , 1968, The Journal of the Acoustical Society of America.

[75]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[76]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[77]  Margaret Kahn,et al.  The effects of five voice characteristics on LPC quality , 1983, ICASSP.

[78]  A Agrawal,et al.  Effect of voiced speech parameters on the intelligibility of PB words. , 1975, The Journal of the Acoustical Society of America.

[79]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[80]  Donald G. Childers,et al.  Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[81]  J. Laver The phonetic description of voice quality , 1980 .

[82]  I. Pollack,et al.  Effects of Differentiation, Integration, and Infinite Peak Clipping upon the Intelligibility of Speech , 1948 .

[83]  Sadaoki Furui,et al.  Digital Speech Processing, Synthesis, and Recognition , 1989 .