Physiological foundations of temporal integration in the perception of speech

Speech understanding involves the integration and identification of acoustic cues that are distributed over multiple time scales. These range from the sub-millisecond intervals associated with spectral estimates, to the few-millisecond periods of the fundamental frequency (f0), to the tens of milliseconds spanning phonemic and syllabic segments, and the longer time scales involved in perceiving words and sentences. Much of what is known about the auditory representation of these cues comes from experimental studies in various animal species. Especially, well studied are the early stages of the cochlea and cochlear nucleus, and the later cortical stages (Sachs & Young, 1979; Young & Sachs, 1979; Young, 1997, Chap. 4; Clarey, Barone, & Imig, 1992; Calhoun & Schreiner, 1995; Shamma Versnel, & Kowalski, 1995; Kowalski, Depireux, & Shamma, 1996; deCharms, Blake, & Merzenich 1998). By contrast, the physiological underpinnings of the linguistic processes remain highly elusive despite extensive investigations employing a host of new human fast-imaging technologies and computational models over the last decade (Poeppel, 2001; Horwitz, Friston, & Taylor, 2000). These techniques do not yet have the resolution to give a clear insight into single units and the neural circuits and their responses and representations. Consequently, the review below concerns conceptions of auditory processes operating at the faster time scales found in the earlier auditory pathway where animal experimentation is possible. Furthermore, they are based on extrapolations from experiments that employ simpler stimuli than speech (such as tones and noise with various amplitude and frequency modulations), and hence the models discussed are not specific to speech perception. Temporal integration in the auditory system actually refers to integration of spectro-temporal features over several stages, giving rise to varied forms of spectro-temporal selectivity that have been deemed valuable for speech processing. One example is the selectivity to speed and direction of frequency-modulated (FM) tones that resemble formant transitions in speech (Nelken & ARTICLE IN PRESS

[1]  I. Nelken,et al.  Responses to linear and logarithmic frequency‐modulated sweeps in ferret primary auditory cortex , 2000, The European journal of neuroscience.

[2]  Steven Greenberg,et al.  Speech intelligibility derived from exceedingly sparse spectral information , 1998, ICSLP.

[3]  P. Heil Representation of Sound Onsets in the Auditory System , 2001, Audiology and Neurotology.

[4]  Shihab A. Shamma,et al.  Representation of musical timbre in the auditory cortex , 1997 .

[5]  Gerald Langner,et al.  Periodicity coding in the auditory system , 1992, Hearing Research.

[6]  Karl J. Friston,et al.  Neural modeling and functional brain imaging: an overview , 2000, Neural Networks.

[7]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[8]  S. Shamma,et al.  Analysis of dynamic spectra in ferret primary auditory cortex. II. Prediction of unit responses to arbitrary dynamic spectra. , 1996, Journal of neurophysiology.

[9]  M. Sachs,et al.  Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. , 1979, The Journal of the Acoustical Society of America.

[10]  M. Sachs,et al.  Effect of electrical stimulation of the crossed olivocochlear bundle on auditory nerve response to tones in noise. , 1987, Journal of neurophysiology.

[11]  S. Shamma,et al.  Ripple Analysis in Ferret Primary Auditory Cortex. I. Response Characteristics of Single Units to Sinusoidally Rippled Spectra , 1994 .

[12]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[13]  David Poeppel,et al.  New approaches to the neural basis of speech sound processing: introduction to special section on brain and speech , 2001 .

[14]  S. Shamma,et al.  Spectro-temporal modulation transfer functions and speech intelligibility. , 1999, The Journal of the Acoustical Society of America.

[15]  Mounya Elhilali,et al.  A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..

[16]  Stephen Grossberg,et al.  Resonant neural dynamics of speech perception , 2003, J. Phonetics.

[17]  M. Sachs,et al.  The representations of the steady-state vowel sound /e/ in the discharge patterns of cat anteroventral cochlear nucleus neurons. , 1990, Journal of neurophysiology.

[18]  M. Merzenich,et al.  Optimizing sound features for cortical neurons. , 1998, Science.

[19]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[20]  Pascal Barone,et al.  Physiology of Thalamus and Cortex , 1992 .

[21]  S. Shamma,et al.  An account of monaural phase sensitivity. , 2002, The Journal of the Acoustical Society of America.

[22]  M. Sachs,et al.  Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers. , 1979, The Journal of the Acoustical Society of America.

[23]  G. Shepherd The Synaptic Organization of the Brain , 1979 .

[24]  R. Fay,et al.  The Mammalian auditory pathway : neurophysiology , 1992 .

[25]  Ce Schreiner,et al.  Spectral envelope coding in cat primary auditory cortex: Properties of ripple transfer functions , 1994 .

[26]  S. Shamma,et al.  Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. , 1996, Journal of neurophysiology.