Basic auditory processes involved in the analysis of speech sounds

This paper reviews the basic aspects of auditory processing that play a role in the perception of speech. The frequency selectivity of the auditory system, as measured using masking experiments, is described and used to derive the internal representation of the spectrum (the excitation pattern) of speech sounds. The perception of timbre and distinctions in quality between vowels are related to both static and dynamic aspects of the spectra of sounds. The perception of pitch and its role in speech perception are described. Measures of the temporal resolution of the auditory system are described and a model of temporal resolution based on a sliding temporal integrator is outlined. The combined effects of frequency and temporal resolution can be modelled by calculation of the spectro-temporal excitation pattern, which gives good insight into the internal representation of speech sounds. For speech presented in quiet, the resolution of the auditory system in frequency and time usually markedly exceeds the resolution necessary for the identification or discrimination of speech sounds, which partly accounts for the robust nature of speech perception. However, for people with impaired hearing, speech perception is often much less robust.

[1]  Eric D Young,et al.  Neural representation of spectral and temporal information in speech , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[2]  Tom R. Gaunt,et al.  Across-channel Masking of Changes in Modulation Depth for Amplitude- and Frequency-modulated Signals , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[3]  R. Patterson,et al.  Residue pitch as a function of component spacing. , 1976, The Journal of the Acoustical Society of America.

[4]  J. Smurzyński,et al.  Pitch identification and discrimination for complex tones with many harmonics , 1990 .

[5]  J. T. Hart,et al.  Differential sensitivity to pitch distance, particularly in speech. , 1981 .

[6]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[7]  Douglas S Brungart,et al.  Across-ear interference from parametrically degraded synthetic speech signals in a dichotic cocktail-party listening task. , 2005, The Journal of the Acoustical Society of America.

[8]  E. Lopez-Poveda,et al.  A human nonlinear cochlear filterbank. , 2001, The Journal of the Acoustical Society of America.

[9]  L. Vogten,et al.  Low-level pure-tone masking: a comparison of "tuning curves" obtained with simultaneous and forward masking. , 1978, The Journal of the Acoustical Society of America.

[10]  A. M. Mimpen,et al.  The ear as a frequency analyzer. II. , 1964, The Journal of the Acoustical Society of America.

[11]  B. Moore,et al.  Factors affecting psychophysical tuning curves for normally hearing subjects , 2004, Hearing Research.

[12]  Reinier Plomp,et al.  Aspects of tone sensation , 1976 .

[13]  N. Viemeister,et al.  Temporal modulation transfer functions in normal-hearing and hearing-impaired listeners. , 1985, Audiology : official organ of the International Society of Audiology.

[14]  Ray Meddis,et al.  A revised model of the inner-hair cell and auditory-nerve complex. , 2002, The Journal of the Acoustical Society of America.

[15]  Hugo Fastl,et al.  Psychoacoustics Facts and Models. 2nd updated edition , 1999 .

[16]  T. Dau Modeling auditory processing of amplitude modulation , 1997 .

[17]  Thomas D. Carrell The effect of amplitude comodulation on extracting sentences from noise: Evidence from a variety of contexts , 1993 .

[18]  Randy L Diehl,et al.  Acoustic and auditory phonetics: the adaptive design of speech sound systems , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[19]  B. Moore Cochlear Hearing Loss , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[20]  B C Moore,et al.  Modulation discrimination interference and auditory grouping. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[21]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[22]  B C Moore,et al.  Gap detection as a function of frequency, bandwidth, and level. , 1983, The Journal of the Acoustical Society of America.

[23]  Torsten Daub Modeling auditory processing of amplitude modulation II. Spectral and temporal integration , 1997 .

[24]  R. Plomp Pitch of complex tones. , 1966, The Journal of the Acoustical Society of America.

[25]  Brian C J Moore,et al.  Side effects of fast-acting dynamic range compression that affect intelligibility in a competing speech task. , 2004, The Journal of the Acoustical Society of America.

[26]  Roy D. Patterson,et al.  Psychophysical tuning curves: Restricting the listening band to the signal region , 1979 .

[27]  R. Plomp,et al.  Effect of phase on the timbre of complex tones. , 1969, The Journal of the Acoustical Society of America.

[28]  R. Plomp Rate of Decay of Auditory Sensation , 1964 .

[29]  H. Hake,et al.  On the Masking Pattern of a Simple Auditory Stimulus , 1950 .

[30]  B C Moore,et al.  Dynamic range and asymmetry of the auditory filter. , 1984, The Journal of the Acoustical Society of America.

[31]  B. Leshowitz Measurement of the two-click threshold. , 1971, The Journal of the Acoustical Society of America.

[32]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[33]  T. Irino,et al.  A compressive gammachirp auditory filter for both physiological and psychophysical data. , 2001, The Journal of the Acoustical Society of America.

[34]  James L. Flanagan,et al.  Pitch Discrimination for Synthetic Vowels , 1957 .

[35]  Torsten Dau,et al.  Masking patterns for sinusoidal and narrow-band noise maskers. , 1998, The Journal of the Acoustical Society of America.

[36]  Brian C. J. Moore,et al.  Formulae describing frequency selectivity as a function of frequency and level, and their use in calculating excitation patterns , 1987, Hearing Research.

[37]  S Rosen,et al.  Auditory filter nonlinearity at 2 kHz in normal hearing listeners. , 1998, The Journal of the Acoustical Society of America.

[38]  Sunil Puria,et al.  Human middle-ear sound transfer function and cochlear input impedance , 2001, Hearing Research.

[39]  B. Moore,et al.  A Model of Loudness Applicable to Time-Varying Sounds , 2002 .

[40]  E. Shaw Transformation of sound pressure level from the free field to the eardrum in the horizontal plane. , 1974, The Journal of the Acoustical Society of America.

[41]  Brian C. J. Moore,et al.  Temporal integration and context effects in hearing , 2003, J. Phonetics.

[42]  D. Klatt,et al.  Discrimination of fundamental frequency contours in synthetic speech: implications for models of pitch perception. , 1973, The Journal of the Acoustical Society of America.

[43]  B. Kollmeier,et al.  Within-channel cues in comodulation masking release (CMR): experiments and model predictions using a modulation-filterbank model. , 1999, The Journal of the Acoustical Society of America.

[44]  B C Moore,et al.  Pitch discrimination and phase sensitivity in young and elderly subjects and its relationship to frequency selectivity. , 1992, The Journal of the Acoustical Society of America.

[45]  Deborah A Vickers,et al.  The relative role of beats and combination tones in determining the shapes of masking patterns at 2 kHz: I. Normal-hearing listeners , 2000, Hearing Research.

[46]  R. Carlyon,et al.  The role of resolved and unresolved harmonics in pitch perception and frequency modulation discrimination. , 1994, The Journal of the Acoustical Society of America.

[47]  John H. Grose,et al.  Chapter 7 – Across-Channel Processes in Masking , 1995 .

[48]  J H Grose,et al.  The detection of temporal gaps as a function of frequency region and absolute noise bandwidth. , 1992, The Journal of the Acoustical Society of America.

[49]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[50]  D. M. Green,et al.  The perception of pitch. , 1974, American scientist.

[51]  R. Patterson,et al.  Off-frequency listening and auditory-filter asymmetry. , 1980, The Journal of the Acoustical Society of America.

[52]  B. Moore,et al.  Frequency selectivity as a function of level and frequency measured with uniformly exciting notched noise. , 2000, The Journal of the Acoustical Society of America.

[53]  B C Moore,et al.  Masking patterns for sinusoidal and narrow-band noise maskers. , 1998, The Journal of the Acoustical Society of America.

[54]  B C Moore,et al.  Masking patterns for synthetic vowels in simultaneous and forward masking. , 1983, The Journal of the Acoustical Society of America.

[55]  A. J. Watkins,et al.  Some effects of filtered contexts on the perception of vowels and fricatives. , 1996, The Journal of the Acoustical Society of America.

[56]  B. J. O'Loughlin,et al.  Improving psychoacoustical tuning curves , 1981, Hearing Research.

[57]  B. Moore,et al.  Frequency and intensity difference limens for harmonics within complex tones. , 1984, The Journal of the Acoustical Society of America.

[58]  B C Moore,et al.  Frequency discrimination of complex tones with overlapping and non-overlapping harmonics. , 1990, The Journal of the Acoustical Society of America.

[59]  Q. Summerfield,et al.  Auditory enhancement of changes in spectral amplitude. , 1987, The Journal of the Acoustical Society of America.

[60]  I. Pollack,et al.  Effects of Differentiation, Integration, and Infinite Peak Clipping upon the Intelligibility of Speech , 1948 .

[61]  J W Hall,et al.  Comodulation masking release for speech stimuli. , 1992, The Journal of the Acoustical Society of America.

[62]  A. J. Watkins Central, auditory mechanisms of perceptual compensation for spectral-envelope distortion. , 1991, The Journal of the Acoustical Society of America.

[63]  D. J. Hermes,et al.  The frequency scale of speech intonation. , 1991, The Journal of the Acoustical Society of America.

[64]  B. Moore,et al.  Relative dominance of individual partials in determining the pitch of complex tones , 1985 .

[65]  B. Moore,et al.  Modeling the additivity of nonsimultaneous masking , 1994, Hearing Research.

[66]  Peter F. Assmann,et al.  Auditory Enhancement in Speech Perception , 1987 .

[67]  Kohlrausch,et al.  The influence of carrier level and frequency on modulation and beat-detection thresholds for sinusoidal carriers , 2000, The Journal of the Acoustical Society of America.

[68]  R. Ritsma Frequencies dominant in the perception of the pitch of complex sounds. , 1966, The Journal of the Acoustical Society of America.

[69]  Michael Kiefte,et al.  Sensitivity to change in perception of speech , 2003, Speech Commun..

[70]  J. Pierrehumbert The perception of fundamental frequency declination. , 1979, The Journal of the Acoustical Society of America.

[71]  N. Viemeister Temporal modulation transfer functions based upon modulation thresholds. , 1979, The Journal of the Acoustical Society of America.

[72]  B. Moore Frequency Selectivity in Hearing , 1987 .

[73]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[74]  B C Moore,et al.  The shape of the ear's temporal window. , 1988, The Journal of the Acoustical Society of America.

[75]  G. Von Bismarck,et al.  Sharpness as an attribute of the timbre of steady sounds , 1974 .

[76]  M. Gribskov,et al.  [9] Profile analysis , 1990 .

[77]  E. Zwicker,et al.  Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .

[78]  D. A. Ronken,et al.  Monaural detection of a phase difference between clicks. , 1970, The Journal of the Acoustical Society of America.

[79]  A. J. Watkins,et al.  Effects of spectral contrast on perceptual compensation for spectral-envelope distortion. , 1996, The Journal of the Acoustical Society of America.

[80]  P. Fitzgibbons,et al.  Temporal gap detection in noise as a function of frequency, bandwidth, and level. , 1983, The Journal of the Acoustical Society of America.

[81]  D. Pisoni,et al.  Speech perception without traditional speech cues. , 1981, Science.

[82]  Brian C. J. Moore,et al.  Speech processing for the hearing-impaired: successes, failures, and implications for speech mechanisms , 2003, Speech Commun..

[83]  Joseph W. Hall,et al.  Detection in noise by spectro-temporal pattern analysis. , 1984, The Journal of the Acoustical Society of America.

[84]  R. Carlyon,et al.  Dominance region for pitch: effects of duration and dichotic presentation. , 2005, The Journal of the Acoustical Society of America.

[85]  William M. Hartmann,et al.  Psychoacoustics: Facts and Models , 2001 .

[86]  R. Patterson Auditory filter shapes derived with noise stimuli. , 1976, The Journal of the Acoustical Society of America.

[87]  E Abberton,et al.  Laryngograph Studies of Vocal-Fold Vibration , 1977, Phonetica.

[88]  T D Carrell,et al.  The effect of amplitude comodulation on auditory object formation in sentence perception , 1992, Perception & psychophysics.

[89]  B. Moore,et al.  Temporal window shape as a function of frequency and level. , 1989, The Journal of the Acoustical Society of America.

[90]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[91]  B C Moore,et al.  Off-frequency listening: effects on psychoacoustical tuning curves obtained in simultaneous and forward masking. , 1981, The Journal of the Acoustical Society of America.

[92]  L. Robles,et al.  Basilar-membrane responses to tones at the base of the chinchilla cochlea. , 1997, The Journal of the Acoustical Society of America.

[93]  C J Darwin,et al.  Listening to speech in the presence of other sounds , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[94]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[95]  B. Moore,et al.  Gap detection and the auditory filter: phase effects using sinusoidal stimuli. , 1987, The Journal of the Acoustical Society of America.

[96]  J M Festen Contributions of comodulation masking release and temporal resolution to the speech-reception threshold masked by an interfering voice. , 1993, The Journal of the Acoustical Society of America.

[97]  B C Moore Parallels between frequency selectivity measured psychophysically and in cochlear mechanics. , 1986, Scandinavian audiology. Supplementum.

[98]  Brian C J Moore,et al.  Frequency discrimination of complex tones; assessing the role of component resolvability and temporal fine structure. , 2006, The Journal of the Acoustical Society of America.

[99]  C. Schreiner,et al.  Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms. , 1988, Journal of neurophysiology.

[100]  B. Delgutte,et al.  Physiological mechanisms of psychophysical masking: observations from auditory-nerve fibers. , 1990, The Journal of the Acoustical Society of America.

[101]  C. Schreiner,et al.  Representation of amplitude modulation in the auditory cortex of the cat. I. The anterior auditory field (AAF) , 1986, Hearing Research.

[102]  W A Yost,et al.  Modulation interference in detection and discrimination of amplitude modulation. , 1989, The Journal of the Acoustical Society of America.

[103]  L. Carney,et al.  A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression. , 2001, The Journal of the Acoustical Society of America.

[104]  A Kohlrausch,et al.  Psychoacoustical evaluation of PSOLA. II. Double-formant stimuli and the role of vocal perturbation. , 1999, The Journal of the Acoustical Society of America.

[105]  Mary Florentine,et al.  Gap Detection in Normal and Impaired Listeners: The Effect of Level and Frequency , 1985 .

[106]  Brian C. J. Moore,et al.  Perception of Pitch by People with Cochlear Hearing Loss and by Cochlear Implant Users , 2005 .

[107]  E. Zwicker “Negative Afterimage” in Hearing , 1964 .

[108]  W A Yost,et al.  Across-critical-band processing of amplitude-modulated tones. , 1989, The Journal of the Acoustical Society of America.

[109]  T. Baer,et al.  Harmonics-to-noise ratio as an index of the degree of hoarseness. , 1982, The Journal of the Acoustical Society of America.

[110]  K. Saberi,et al.  Cognitive restoration of reversed speech , 1999, Nature.

[111]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[112]  D. M. Green,et al.  Temporal acuity as a function of frequency. , 1973, The Journal of the Acoustical Society of America.

[113]  H. Schuknecht,et al.  Perceptive hearing loss , 1958, The Laryngoscope.

[114]  R. Kay,et al.  On the existence in human auditory pathways of channels selectively tuned to the modulation present in frequency‐modulated tones , 1972, The Journal of physiology.

[115]  R. Patterson,et al.  A pulse ribbon model of monaural phase perception. , 1987, The Journal of the Acoustical Society of America.

[116]  B C Moore,et al.  Audibility of partials in inharmonic complex tones. , 1993, The Journal of the Acoustical Society of America.

[117]  Brian C. J. Moore Masking in the Human Auditory System , 1996 .

[118]  B C Moore,et al.  Detection of temporal gaps in sinusoids: effects of frequency and level. , 1993, The Journal of the Acoustical Society of America.

[119]  M. Sachs,et al.  Two-tone inhibition in auditory-nerve fibers. , 1968, The Journal of the Acoustical Society of America.

[120]  Brian C. J. Moore,et al.  Across-channel processes in auditory masking , 1992 .

[121]  I. Pollack,et al.  Periodicity discrimination for auditory pulse trains. , 1968, The Journal of the Acoustical Society of America.