Spatiotemporal Encoding of Vowels in Noise Studied with the Responses of Individual Auditory-Nerve Fibers

The neural basis for robust speech perception exhibited by human listeners (e.g., across sound levels or background noises) remains unknown. The encoding of spectral shape based on auditory-nerve (AN) discharge rate degrades significantly at high sound levels, particularly in high spontaneousrate (SR) fibers (Sachs and Young 1979). However, continued support for rate coding has come from the observations that robust spectral coding occurs in some low-SR fibers for vowels in quiet and that rate-difference profiles provide enough information to account for behavioral discrimination of vowels (Conley and Keilson 1995; May, Huang, Le Prell, and Hienz 1996). Despite this support, it is clear that temporal codes are more robust than rate (Young and Sachs 1979), especially in noise (Delgutte and Kiang 1984; Sachs, Voigt, and Young 1983). Sachs et al. (1983) showed that rate coding in low-SR fibers was significantly degraded at a moderate signal-to-noise ratio for which human perception is robust. In contrast, temporal coding based on the average-localized-synchronized-rate (ALSR) remained robust. Although temporal coding based on ALSR is often shown to be robust, evidence for neural mechanisms to decode these cues is limited. Spatiotemporal mechanisms have been proposed for decoding these types of cues (e.g., Carney, Heinz, Evilsizer, Gilkey, and Colburn 2002; Deng and Geisler 1987; Shamma 1985). However, the detailed evaluation of spatiotemporal mechanisms has been limited primarily to modeling studies due to difficulties associated with the large population responses that are required to study spatiotemporal coding (e.g., see Palmer 1990). For example, Deng and Geisler (1987) used a transmission-line based AN model to suggest that spectral coding based on the peak cross-correlation between adjacent best-frequency (BF) channels was robust in the presence of background noise. In the present study, spectral coding of vowels in noise based on rate, ALSR, and a simple cross-BF coincidence detection scheme is evaluated from the responses of single AN fibers. By using data from a single AN fiber, many of the difficulties associated with large-population studies are eliminated.

[1]  L. Carney,et al.  A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression. , 2001, The Journal of the Acoustical Society of America.

[2]  P. Joris Interaural Time Sensitivity Dominated by Cochlea-Induced Envelope Patterns , 2003, The Journal of Neuroscience.

[3]  Bertrand Delgutte,et al.  Spatio-Temporal Representation of the Pitch of Complex Tones in the Auditory Nerve , 2007 .

[4]  Grace I Wang,et al.  Spatio-temporal representation of the pitch of complex tones in the auditory nerve and cochlear nucleus , 2007 .

[5]  R. Fay,et al.  Speech Processing in the Auditory System , 2010, Springer Handbook of Auditory Research.

[6]  B. Delgutte,et al.  Speech coding in the auditory nerve: V. Vowels in background noise. , 1984, The Journal of the Acoustical Society of America.

[7]  Philip X Joris,et al.  Binaural and cochlear disparities , 2006, Proceedings of the National Academy of Sciences.

[8]  Eric D Young,et al.  Response growth with sound level in auditory-nerve fibers after noise-induced hearing loss. , 2004, Journal of neurophysiology.

[9]  A R Palmer,et al.  The representation of the spectra and fundamental frequencies of steady-state single- and double-vowel sounds in the temporal discharge patterns of guinea pig cochlear-nerve fibers. , 1990, The Journal of the Acoustical Society of America.

[10]  M. Sachs,et al.  Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers. , 1979, The Journal of the Acoustical Society of America.

[11]  Marcel van der Heijden,et al.  Dependence of binaural and cochlear “best delays” on characteristic frequency , 2005 .

[12]  S. Shamma Speech processing in the auditory system. I: The representation of speech sounds in the responses of the auditory nerve. , 1985, The Journal of the Acoustical Society of America.

[13]  M. Sachs,et al.  Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. , 1979, The Journal of the Acoustical Society of America.

[14]  E D Young,et al.  Auditory nerve representation of vowels in background noise. , 1983, Journal of neurophysiology.

[15]  Laurel H. Carney,et al.  Auditory Phase Opponency: A Temporal Model for Masked Detection at Low Frequencies , 2002 .

[16]  Robert D Hienz,et al.  Vowel Formant Frequency Discrimination in Cats: Comparison of Auditory Nerve Representations and Psychophysical Thresholds. , 1996, Auditory neuroscience.

[17]  S. Keilson,et al.  Rate representation and discriminability of second formant frequencies for /ε/‐like steady‐state vowels in cat auditory nerve , 1995 .

[18]  Birger Kollmeier,et al.  Hearing - from sensory processing to perception , 2007 .

[19]  M. Sachs,et al.  Vowel representations in the ventral cochlear nucleus of the cat: effects of level, background noise, and behavioral state. , 1998, Journal of neurophysiology.

[20]  C. D. Geisler,et al.  A composite auditory model for processing speech sounds. , 1987, The Journal of the Acoustical Society of America.

[21]  S. Shamma Speech processing in the auditory system. II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve. , 1985, The Journal of the Acoustical Society of America.

[22]  Rate representation and discriminability of second formant frequencies for /epsilon/-like steady-state vowels in cat auditory nerve. , 1995, The Journal of the Acoustical Society of America.