Perceptual and acoustic impacts of aberrant properties of electrolaryngeal speech

Advanced laryngeal cancer is often treated by surgical removal of the larynx (laryngectomy) thus rendering patients unable to produce normal voice and speech. Laryngectomy patients must rely on an alternative means of producing voice and speech, with the most common method being the use of an electrolarynx (EL). The EL is a small, hand-held, electromechanical device that acoustically excites the vocal tract when held against the neck or at the lips. While the EL provides a serviceable means of communication, the resulting speech has several shortcomings in terms of both intelligibility and speech quality. Previous studies have identified and tried to correct different single selected acoustic properties associated with the abnormal quality of EL speech, but with only limited success. There remains uncertainty about: (1) which components of the EL speech acoustic signal are contributing most to its abnormal quality and (2) what kinds of acoustic enhancements would be most effective in improving the quality of EL speech. Using a combination of listening experiments, acoustic analysis and acoustic modeling, this thesis investigated the perceptual and acoustic impacts of several aberrant properties of EL speech, with the overall goal of using the results to direct future EL speech improvement efforts. Perceptual experiments conducted by having 10 listeners judge the naturalness of differently enhanced versions of EL speech demonstrated that adding pitch information would produce the most benefit. Removing the EL self-noise and correcting for a lack of low frequency energy would also improve EL speech, but to a lesser extent. However, this study also demonstrated that monotonous, normal speech was found to be more natural than any version of EL speech, indicating that there are other abnormal properties of EL speech contributing to its unnatural quality. An acoustic analysis of a corpus of pre- and post-laryngectomy speech revealed that changes in vocal tract anatomy produce narrower formant bandwidths and spectral zeros that alter the spectral properties of EL speech. Vocal tract modeling confirmed that these spectral zeros are a function of EL placement and thus their effects will vary from user to user. Even though the addition of pitch information was associated with the greatest improvement in EL speech quality, its implementation is not currently possible because it would require access to underlying linguistic and/or neural processes. Based on these findings it was concluded that an enhancement algorithm that corrects for the low frequency deficit, the interference of the EL self-noise, the narrower formant bandwidths, and the effect of the source location, should produce EL speech whose quality surpasses what is currently available. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  L. Thurstone A law of comparative judgment. , 1994 .

[2]  J G Clark,et al.  Alaryngeal speech intelligibility and the older listener. , 1985, The Journal of speech and hearing disorders.

[3]  E Lauder,et al.  Current status of laryngectomee rehabilitation: I. Results of therapy. , 1982, American journal of otolaryngology.

[4]  M S Kommers,et al.  Wives' evaluation of problems related to laryngectomy. , 1979, Journal of communication disorders.

[5]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[6]  J. Hoit,et al.  Cognitive-linguistic demands and speech breathing. , 1996, Journal of speech and hearing research.

[7]  Henry F. Kaiser,et al.  Contributions to the Method of Paired Comparisons , 1978 .

[8]  Geoffrey S. Meltzner,et al.  Measuring the neck frequency response function of laryngectomy patients: implications for the design of electrolarynx devices. , 2003, The Journal of the Acoustical Society of America.

[9]  E. Chuang,et al.  Glottal characteristics of male speakers: acoustic correlates and comparison with female data. , 1996, The Journal of the Acoustical Society of America.

[10]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[11]  K. Stevens,et al.  Estimation of formant band widths from measurements of transient response of the vocal tract. , 1958, Journal of Speech and Hearing Disorders.

[12]  H. Morris,et al.  Communication Status following Laryngectomy: The Iowa Experience 1984–1987 , 1992, The Annals of otology, rhinology, and laryngology.

[13]  N Hibler,et al.  [Alaryngeal speech]. , 1978, Wiener medizinische Wochenschrift.

[14]  P S King,et al.  Rehabilitation and adaptation of laryngectomy patients. , 1968, American journal of physical medicine.

[15]  Richard J. Harris A primer of multivariate statistics , 1975 .

[16]  James B. Kobler,et al.  Development of sound source components for a new electrolarynx speech prosthesis , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  M S Weiss,et al.  Electrolaryngeal speech produced by laryngectomized subjects: perceptual characteristics. , 1985, Journal of speech and hearing research.

[18]  B Weinberg,et al.  Tone in Thai alaryngeal speech. , 1988, The Journal of speech and hearing disorders.

[19]  J. L. B. Richardson Communication After Laryngectomy , 1989 .

[20]  H. L. Barney,et al.  An experimental transistorized artificial larynx , 1959 .

[21]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[22]  Carol Y. Espy-Wilson,et al.  Improvement of electrolaryngeal speech by introducing normal excitation information , 1999, EUROSPEECH.

[23]  R L Goode The development of an improved artificial larynx. , 1969, Transactions - American Academy of Ophthalmology and Otolaryngology. American Academy of Ophthalmology and Otolaryngology.

[24]  G. Gates,et al.  Current status of laryngectomee rehabilitation: II. Causes of failure. , 1982, American journal of otolaryngology.

[25]  E. Kaplan,et al.  The assessment of aphasia and related disorders , 1972 .

[26]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[27]  M W Skinner,et al.  Effect of amplification on the intelligibility of speech produced with an electrolarynx , 1985, The Laryngoscope.

[28]  B Weinberg,et al.  Low-frequency energy deficit in electrolaryngeal speech. , 1991, Journal of speech and hearing research.

[29]  Allen L. Edwards,et al.  Techniques Of Attitude Scale Construction , 1958 .

[30]  Sridha Sridharan,et al.  Application of noise reduction techniques for alaryngeal speech enhancement , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[31]  Brad H. Story,et al.  Parameterization of vocal tract area functions by empirical orthogonal modes , 1998 .

[32]  F. Mosteller Remarks on the method of paired comparisons: II. The effect of an aberrant standard deviation when equal standard deviations and equal correlations are assumed. , 1951, Psychometrika.

[33]  E. Hoffman,et al.  Vocal tract area functions from magnetic resonance imaging. , 1996, The Journal of the Acoustical Society of America.

[34]  I. T. Draper THE ASSESSMENT OF APHASIA AND RELATED DISORDERS , 1973 .

[35]  B Weinberg,et al.  Formant frequency characteristics of esophageal speech. , 1972, Journal of speech and hearing research.

[36]  M.L. Ng,et al.  Perceptions of Tonal Changes in Normal Laryngeal, Esophageal, and Artificial Laryngeal Male Cantonese Speakers , 1998, Folia Phoniatrica et Logopaedica.

[37]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[38]  Tohru Ifukube,et al.  Design of a new electrolarynx having a pitch control function , 1994, Proceedings of 1994 3rd IEEE International Workshop on Robot and Human Communication.

[39]  T H Crystal,et al.  Segmental durations in connected speech signals: preliminary results. , 1982, The Journal of the Acoustical Society of America.

[40]  Yang Gao,et al.  The SMV algorithm selected by TIA and 3GPP2 for CDMA applications , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[41]  J. Heinz,et al.  Acoustical and perceptual characteristics of speech produced with an electronic artificial larynx. , 1979, The Journal of the Acoustical Society of America.

[42]  R. Myrick,et al.  Vocal tract modeling as related to the use of an artificial larynx , 1993, 1993 IEEE Annual Northeast Bioengineering Conference.

[43]  Roch Lefebvre,et al.  The adaptive multirate wideband speech codec (AMR-WB) , 2002, IEEE Trans. Speech Audio Process..

[44]  W. Hong,et al.  Functional outcomes following treatment for advanced laryngeal cancer. Part I--Voice preservation in advanced laryngeal cancer. Part II--Laryngectomy rehabilitation: the state of the art in the VA System. Research Speech-Language Pathologists. Department of Veterans Affairs Laryngeal Cancer Study Gr , 1998, The Annals of otology, rhinology & laryngology. Supplement.

[45]  P M Webster,et al.  Surgeons’ Reported Attitudes and Practices regarding Alaryngeal Speech , 1990, The Annals of otology, rhinology, and laryngology.

[46]  Thomas P. Barnwell,et al.  MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .

[47]  H. L. Barney,et al.  New Artificial Larynx , 1959 .

[48]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[49]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[50]  S. Craig Greer,et al.  Standardization of the selectable mode vocoder , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[51]  C Y Espy-Wilson,et al.  Enhancement of electrolaryngeal speech by adaptive filtering. , 1998, Journal of speech, language, and hearing research : JSLHR.

[52]  T. Chiba The vowel, its nature and structure , 1958 .