Multiresolution spectrotemporal analysis of complex sounds.

A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system. The model provides a unified multiresolution representation of the spectral and temporal features likely critical in the perception of sound. Simplified, more specifically tailored versions of this model have already been validated by successful application in the assessment of speech intelligibility [Elhilali et al., Speech Commun. 41(2-3), 331-348 (2003); Chi et al., J. Acoust. Soc. Am. 106, 2719-2732 (1999)] and in explaining the perception of monaural phase sensitivity [R. Carlyon and S. Shamma, J. Acoust. Soc. Am. 114, 333-348 (2003)]. Here we provide a more complete mathematical formulation of the model, illustrating how complex signals are transformed through various stages of the model, and relating it to comparable existing models of auditory processing. Furthermore, we outline several reconstruction algorithms to resynthesize the sound from the model output so as to evaluate the fidelity of the representation and contribution of different features and cues to the sound percept.

[1]  E. C. Levy Complex-curve fitting , 1959, IRE Transactions on Automatic Control.

[2]  K. D. Kryter Methods for the Calculation and Use of the Articulation Index , 1962 .

[3]  A M Liberman,et al.  Perception of the speech code. , 1967, Psychological review.

[4]  R. Gerchberg A practical algorithm for the determination of phase from image and diffraction plane pictures , 1972 .

[5]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[6]  D. O. Kim,et al.  Cochlear nerve fiber responses: distribution along the cochlear partition. , 1975, The Journal of the Acoustical Society of America.

[7]  A. Papoulis A new algorithm in spectral analysis and band-limited extrapolation. , 1975 .

[8]  N. Viemeister Temporal modulation transfer functions based upon modulation thresholds. , 1979, The Journal of the Acoustical Society of America.

[9]  A. Oppenheim,et al.  Signal reconstruction from phase or magnitude , 1980 .

[10]  T. Houtgast,et al.  Predicting speech intelligibility in rooms from the modulation transfer function, I. General room acoustics , 1980 .

[11]  J R Fienup,et al.  Phase retrieval algorithms: a comparison. , 1982, Applied optics.

[12]  M. Hayes The reconstruction of a multidimensional sequence from the phase or magnitude of its Fourier transform , 1982 .

[13]  Henry Stark,et al.  Signal restoration from phase by projections onto convex sets , 1983, ICASSP.

[14]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[15]  R.H.T Bates,et al.  Uniqueness of solutions to two-dimensional fourier phase problems for localized and positive images , 1984, Comput. Vis. Graph. Image Process..

[16]  L. A. Westerman,et al.  Rapid and short-term adaptation in auditory nerve responses , 1984, Hearing Research.

[17]  Aharon Levi,et al.  Image restoration by the method of generalized projections with application to restoration from magnitude , 1984 .

[18]  A J Ahumada,et al.  Model of human visual-motion sensing. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[19]  S. Shamma Speech processing in the auditory system. II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve. , 1985, The Journal of the Acoustical Society of America.

[20]  D. M. Green ‘Frequency’ and the Detection of Spectral Shape Change , 1986 .

[21]  James R. Fienup,et al.  Phase-retrieval stagnation problems and solutions , 1986 .

[22]  J. Rinzel,et al.  A biophysical model of cochlear processing: intensity dependence of pure tone responses. , 1986, The Journal of the Acoustical Society of America.

[23]  C. Schreiner,et al.  Representation of amplitude modulation in the auditory cortex of the cat. I. The anterior auditory field (AAF) , 1986, Hearing Research.

[24]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[25]  C. Schreiner,et al.  Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms. , 1988, Journal of neurophysiology.

[26]  G. Jantzen 1988 , 1988, The Winning Cars of the Indianapolis 500.

[27]  T. Houtgast Frequency selectivity in amplitude-modulation detection. , 1989, The Journal of the Acoustical Society of America.

[28]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[29]  Shihab A. Shamma Spatial and temporal processing in central auditory networks , 1989 .

[30]  J R Cohen,et al.  Application of an auditory model to speech recognition. , 1989, The Journal of the Acoustical Society of America.

[31]  D. Grantham,et al.  Modulation masking: effects of modulation frequency, depth, and phase. , 1989, The Journal of the Acoustical Society of America.

[32]  N. Suga,et al.  Distribution of combination-sensitive neurons in the ventral fringe area of the auditory cortex of the mustached bat. , 1989, Journal of neurophysiology.

[33]  J. H. Seldin,et al.  Numerical investigation of the uniqueness of phase retrieval , 1990 .

[34]  R. Meddis,et al.  Implementation details of a computation model of the inner hair‐cell auditory‐nerve synapse , 1990 .

[35]  S. Sheft,et al.  Temporal integration in amplitude modulation detection. , 1990, The Journal of the Acoustical Society of America.

[36]  R. Plomp,et al.  Effect of spectral envelope smearing on speech reception. I. , 1991, The Journal of the Acoustical Society of America.

[37]  Gerald Langner,et al.  Periodicity coding in the auditory system , 1992, Hearing Research.

[38]  T. Yin,et al.  Responses to amplitude-modulated tones in the auditory nerve of the cat. , 1992, The Journal of the Acoustical Society of America.

[39]  S. Rosen Temporal information in speech: acoustic, auditory and linguistic aspects. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[40]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[41]  B. Moore,et al.  Effects of spectral smearing on the intelligibility of sentences in noise , 1993 .

[42]  L. Carney,et al.  A model for the responses of low-frequency auditory-nerve fibers in cat. , 1993, The Journal of the Acoustical Society of America.

[43]  Hideki Kawahara,et al.  Signal reconstruction from modified auditory wavelet transform , 1993, IEEE Trans. Signal Process..

[44]  S. Shamma,et al.  Organization of response areas in ferret primary auditory cortex. , 1993, Journal of neurophysiology.

[45]  R. Plomp,et al.  Effect of spectral envelope smearing on speech reception. II. , 1992, The Journal of the Acoustical Society of America.

[46]  R. Drullman Temporal envelope and fine structure cues for speech intelligibility , 1994 .

[47]  S. Shamma,et al.  Ripple Analysis in Ferret Primary Auditory Cortex. I. Response Characteristics of Single Units to Sinusoidally Rippled Spectra , 1994 .

[48]  Richard F. Lyon,et al.  Auditory model inversion for sound separation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[49]  Kuansan Wang,et al.  Self-normalization and noise-robustness in early auditory representations , 1994, IEEE Trans. Speech Audio Process..

[50]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[51]  Ce Schreiner,et al.  Spectral envelope coding in cat primary auditory cortex: Properties of ripple transfer functions , 1994 .

[52]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[53]  Kuansan Wang,et al.  Spectral shape analysis in the central auditory system , 1995, IEEE Trans. Speech Audio Process..

[54]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[55]  Davis Pan,et al.  A Tutorial on MPEG/Audio Compression , 1995, IEEE Multim..

[56]  Misha Pavel,et al.  Intelligibility of speech with filtered time trajectories of spectral envelopes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[57]  S. Shamma,et al.  Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. , 1996, Journal of neurophysiology.

[58]  Biing-Hwang Juang,et al.  Time-frequency analysis and auditory modeling for automatic recognition of speech , 1996, Proc. IEEE.

[59]  Shihab Shamma,et al.  Auditory Representations of Timbre and Pitch , 1996 .

[60]  T. Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. II. Simulations and measurements. , 1996, The Journal of the Acoustical Society of America.

[61]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[62]  Shihab A. Shamma,et al.  Representation of musical timbre in the auditory cortex , 1997 .

[63]  Rolf Unbehauen,et al.  Methods for reconstruction of 2-D sequences from Fourier transform magnitude , 1997, IEEE Trans. Image Process..

[64]  L. Abbott,et al.  Synaptic Depression and Cortical Gain Control , 1997, Science.

[65]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration. , 1997, The Journal of the Acoustical Society of America.

[66]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[67]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.

[68]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[69]  Steven Greenberg,et al.  Speech intelligibility derived from exceedingly sparse spectral information , 1998, ICSLP.

[70]  M. Merzenich,et al.  Optimizing sound features for cortical neurons. , 1998, Science.

[71]  S. Shamma,et al.  Spectro-temporal modulation transfer functions and speech intelligibility. , 1999, The Journal of the Acoustical Society of America.

[72]  M. Hansen,et al.  Continuous assessment of time-varying speech quality. , 1999, The Journal of the Acoustical Society of America.

[73]  A. Destexhe,et al.  Dual intracellular recordings and computational models of slow inhibitory postsynaptic potentials in rat neocortical and hippocampal slices , 1999, Neuroscience.

[74]  J Tchorz,et al.  A model of auditory perception as front end for automatic speech recognition. , 1999, The Journal of the Acoustical Society of America.

[75]  S. Shamma,et al.  Detection of modulation in spectral envelopes and linear-rippled noises by budgerigars (Melopsittacus undulatus). , 1999, The Journal of the Acoustical Society of America.

[76]  T. Dau,et al.  Characterizing frequency selectivity for envelope fluctuations. , 2000, The Journal of the Acoustical Society of America.

[77]  I. Nelken,et al.  Responses to linear and logarithmic frequency‐modulated sweeps in ferret primary auditory cortex , 2000, The European journal of neuroscience.

[78]  R. Shannon,et al.  Effect of stimulation rate on phoneme recognition by nucleus-22 cochlear implant listeners. , 2000, The Journal of the Acoustical Society of America.

[79]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[80]  O Ghitza,et al.  On the upper cutoff frequency of the auditory critical-band envelope detectors in the context of speech perception. , 2001, The Journal of the Acoustical Society of America.

[81]  Xiaoqin Wang,et al.  Temporal and rate representations of time-varying signals in the auditory cortex of awake primates , 2001, Nature Neuroscience.

[82]  Birger Kollmeier,et al.  Combining speech enhancement and auditory feature extraction for robust speech recognition , 2000, Speech Commun..

[83]  K. Miller,et al.  Thalamocortical NMDA conductances and intracortical inhibition can explain cortical temporal tuning , 2001, Nature Neuroscience.

[84]  C. Micheyl,et al.  Auditory stream segregation on the basis of amplitude-modulation rate. , 2002, The Journal of the Acoustical Society of America.

[85]  Lee M. Miller,et al.  Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. , 2002, Journal of neurophysiology.

[86]  B. Moore,et al.  Primitive stream segregation of tone sequences without differences in fundamental frequency or passband. , 2002, The Journal of the Acoustical Society of America.

[87]  Zachary M. Smith,et al.  Chimaeric sounds reveal dichotomies in auditory perception , 2002, Nature.

[88]  J. Eggermont Temporal modulation transfer functions in cat primary auditory cortex: separating stimulus effects from neural mechanisms. , 2002, Journal of neurophysiology.

[89]  Les E. Atlas,et al.  EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .

[90]  I. Nelken,et al.  Processing of low-probability sounds by cortical neurons , 2003, Nature Neuroscience.

[91]  S. Shamma,et al.  An account of monaural phase sensitivity. , 2002, The Journal of the Acoustical Society of America.

[92]  Mounya Elhilali,et al.  A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..

[93]  Shihab Shamma Physiological foundations of temporal integration in the perception of speech , 2003, J. Phonetics.

[94]  R. Fay,et al.  Speech Processing in the Auditory System , 2010, Springer Handbook of Auditory Research.

[95]  Jonathan Z. Simon,et al.  Robust Spectrotemporal Reverse Correlation for the Auditory System: Optimizing Stimulus Design , 2000, Journal of Computational Neuroscience.

[96]  J. Fritz,et al.  Dynamics of Precise Spike Timing in Primary Auditory Cortex , 2004, The Journal of Neuroscience.

[97]  Nima Mesgarani,et al.  Speech enhancement based on filtering the spectrotemporal modulations , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[98]  Nima Mesgarani,et al.  Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.