Speech identification based on temporal fine structure cues.

The contribution of temporal fine structure (TFS) cues to consonant identification was assessed in normal-hearing listeners with two speech-processing schemes designed to remove temporal envelope (E) cues. Stimuli were processed vowel-consonant-vowel speech tokens. Derived from the analytic signal, carrier signals were extracted from the output of a bank of analysis filters. The "PM" and "FM" processing schemes estimated a phase- and frequency-modulation function, respectively, of each carrier signal and applied them to a sinusoidal carrier at the analysis-filter center frequency. In the FM scheme, processed signals were further restricted to the analysis-filter bandwidth. A third scheme retaining only E cues from each band was used for comparison. Stimuli processed with the PM and FM schemes were found to be highly intelligible (50-80% correct identification) over a variety of experimental conditions designed to affect the putative reconstruction of E cues subsequent to peripheral auditory filtering. Analysis of confusions between consonants showed that the contribution of TFS cues was greater for place than manner of articulation, whereas the converse was observed for E cues. Taken together, these results indicate that TFS cues convey important phonetic information that is not solely a consequence of E reconstruction.

[1]  Les E. Atlas,et al.  Coherent envelope detection for modulation filtering of speech , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Brian C J Moore,et al.  Speech perception problems of the hearing impaired reflect inability to use temporal fine structure , 2006, Proceedings of the National Academy of Sciences.

[3]  Herman J. M. Steeneken,et al.  Phoneme-group specific octave-band weights in predicting speech intelligibility , 2002, Speech Commun..

[4]  Michael K. Qin,et al.  Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers. , 2003, The Journal of the Acoustical Society of America.

[5]  S. Rosen Temporal information in speech: acoustic, auditory and linguistic aspects. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[6]  J. L. Flanagan,et al.  Parametric coding of speech spectra , 1980 .

[7]  A. Oxenham,et al.  The relationship between frequency selectivity and pitch discrimination: effects of stimulus level. , 2006, The Journal of the Acoustical Society of America.

[8]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[9]  Fan-Gang Zeng,et al.  Contribution of frequency modulation to speech recognition in noise. , 2005, The Journal of the Acoustical Society of America.

[10]  Herman J. M. Steeneken,et al.  Mutual dependence of the octave-band weights in predicting speech intelligibility , 1999, Speech Commun..

[11]  Stanley Sheft,et al.  Envelope Processing and Sound-Source Perception , 2008 .

[12]  Fan-Gang Zeng,et al.  Encoding frequency Modulation to improve cochlear implant performance in noise , 2005, IEEE Transactions on Biomedical Engineering.

[13]  N. Viemeister Temporal modulation transfer functions based upon modulation thresholds. , 1979, The Journal of the Acoustical Society of America.

[14]  Zachary M. Smith,et al.  Chimaeric sounds reveal dichotomies in auditory perception , 2002, Nature.

[15]  S. Bacon,et al.  Psychophysical measures of auditory nonlinearities as a function of frequency in individuals with normal hearing. , 1999, The Journal of the Acoustical Society of America.

[16]  T. Irino,et al.  A time-domain, level-dependent auditory filter: The gammachirp , 1997 .

[17]  Frédéric Berthommier,et al.  Masking release for consonant features in temporally fluctuating background noise , 2006, Hearing Research.

[18]  S Rosen,et al.  Auditory filter bandwidths as a function of level at low frequencies (125 Hz-1 kHz) , 1992, The Journal of the Acoustical Society of America.

[19]  Jong Ho Won,et al.  Effects of Temporal Fine Structure on the Lateralization of Speech and on Speech Understanding in Noise , 2007, Journal of the Association for Research in Otolaryngology.

[20]  B. Moore,et al.  Frequency selectivity as a function of level and frequency measured with uniformly exciting notched noise. , 2000, The Journal of the Acoustical Society of America.

[21]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[22]  O Ghitza,et al.  On the upper cutoff frequency of the auditory critical-band envelope detectors in the context of speech perception. , 2001, The Journal of the Acoustical Society of America.

[23]  Christian Lorenzi,et al.  The ability of listeners to use recovered envelope cues from speech fine structure. , 2006, The Journal of the Acoustical Society of America.

[24]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[25]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[26]  W. S. Rhode Observations of the vibration of the basilar membrane in squirrel monkeys using the Mössbauer technique. , 1971, The Journal of the Acoustical Society of America.

[27]  D D Dirks,et al.  Application of the Articulation Index and the Speech Transmission Index to the recognition of speech by normal-hearing and hearing-impaired listeners. , 1986, Journal of speech and hearing research.

[28]  G. Stickney,et al.  On the dichotomy in auditory perception between temporal envelope and fine structure cues. , 2004, The Journal of the Acoustical Society of America.

[29]  Fan-Gang Zeng,et al.  Speech recognition with amplitude and frequency modulations. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  J. L. Flanagan,et al.  PHASE VOCODER , 2008 .

[31]  Peggy B Nelson,et al.  Understanding speech in modulated interference: cochlear implant users and normal-hearing listeners. , 2003, The Journal of the Acoustical Society of America.

[32]  C. Turner,et al.  Frequency-weighting functions for broadband speech as estimated by a correlational method. , 1998, The Journal of the Acoustical Society of America.

[33]  Bryan E Pfingst,et al.  Relative importance of temporal envelope and fine structure in lexical-tone perception. , 2003, The Journal of the Acoustical Society of America.

[34]  Michael Weir,et al.  Auditory abilities of experienced signal analysts , 2001 .

[35]  Lloyd A. Jeffress Beating sinusoids and pitch changes. , 1968 .

[36]  D. Pisoni,et al.  Speech perception without traditional speech cues. , 1981, Science.

[37]  A. Papoulis,et al.  Random modulation: A review , 1983 .

[38]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[39]  Bryan E Pfingst,et al.  Relative contributions of spectral and temporal cues for phoneme recognition. , 2005, The Journal of the Acoustical Society of America.

[40]  P. Loughlin,et al.  On the amplitude‐ and frequency‐modulation decomposition of signals , 1996 .

[41]  B. Moore,et al.  Quantifying the effects of fast-acting compression on the envelope of speech. , 2007, The Journal of the Acoustical Society of America.

[42]  Christian Lorenzi,et al.  Effects of periodic interruptions on the intelligibility of speech based on temporal fine-structure or envelope cues. , 2007, The Journal of the Acoustical Society of America.

[43]  R Drullman,et al.  Temporal envelope and fine structure cues for speech intelligibility. , 1994, The Journal of the Acoustical Society of America.

[44]  P E Rubin,et al.  On the perception of speech from time-varying acoustic information: Contributions of amplitude variation , 1990, Perception & psychophysics.

[45]  C D Geisler,et al.  Auditory nerve fiber response to wide-band noise and tone combinations. , 1978, Journal of neurophysiology.

[46]  Roy D. Patterson,et al.  A Dynamic Compressive Gammachirp Auditory Filterbank , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  Michael K. Qin,et al.  Effects of introducing unprocessed low-frequency information on the reception of envelope-vocoder processed speech. , 2006, The Journal of the Acoustical Society of America.