Stimulus design for auditory neuroethology using state space modeling and the extended Kalman smoother

A new method for designing vocalization based stimuli for experiments in auditory neurophysiology is described. This analysis-synthesis technique leverages a state space statistical signal model and the extended Kalman smoother for tracking the frequency, amplitude, and phase information of harmonically related components in recorded vocalizations. Using the same state space model, these parameters can then be used to synthesize the vocalizations and random or deterministic variants of the vocalizations. This method is shown to outperform short-time Fourier transform based frequency tracking methods in both noisy and noise-free synthetic test signals. It is further shown to accurately track recorded hummingbird, human, and bat vocalizations while removing recording artifacts such as noise, echo, and digital aliasing in the synthesis phase.

[1]  Ben-Zion Bobrovsky,et al.  Mean time to loose lock of phase tracking by particle filtering , 2006, Signal Process..

[2]  A. Doupe,et al.  Temporal and Spectral Sensitivity of Complex Auditory Neurons in the Nucleus HVc of Male Zebra Finches , 1998, The Journal of Neuroscience.

[3]  Donal G Sinex,et al.  Responses of auditory nerve fibers to harmonic and mistuned complex tones , 2003, Hearing Research.

[4]  Barbara F. La Scala,et al.  Design of an extended Kalman filter frequency tracker , 1996, IEEE Trans. Signal Process..

[5]  Robert R. Bitmead,et al.  Conditions for stability of the extended Kalman filter and their application to the frequency tracking problem , 1995, Math. Control. Signals Syst..

[6]  Bertrand Delgutte,et al.  Representation of speech-like sounds in the discharge patterns of auditory-nerve fibers. , 1979 .

[7]  Günter Ehret,et al.  Mice and humans perceive multiharmonic communication sounds in the same way , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Yannis Stylianou,et al.  HNM: a simple, efficient harmonic+noise model for speech , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[9]  N Suga,et al.  Disproportionate tonotopic representation for processing CF-FM sonar signals in the mustache bat auditory cortex. , 1976, Science.

[10]  Jonathan Z. Simon,et al.  Robust Spectrotemporal Reverse Correlation for the Auditory System: Optimizing Stimulus Design , 2000, Journal of Computational Neuroscience.

[11]  Langford B. White,et al.  An iterative method for exact maximum likelihood estimation of the parameters of a harmonic series , 1993, IEEE Trans. Autom. Control..

[12]  George D. Pollak,et al.  Characteristics of phasic on neurons in inferior colliculus of unanesthetized bats with observations relating to mechanisms for echo ranging , 1977 .

[13]  Ronald E. Crochiere,et al.  A weighted overlap-add method of short-time Fourier analysis/Synthesis , 1980 .

[14]  D. Sinex,et al.  Responses of inferior colliculus neurons to harmonic and mistuned complex tones , 2002, Hearing Research.

[15]  E. Young,et al.  Nonlinear modeling of auditory-nerve rate responses to wideband stimuli. , 2005, Journal of neurophysiology.

[16]  M. Portnoff,et al.  Time-scale modification of speech based on short-time Fourier analysis , 1981 .

[17]  M. Sachs,et al.  Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. , 1979, The Journal of the Acoustical Society of America.

[18]  Monty A. Escabí,et al.  Representation of spectrotemporal sound information in the ascending auditory pathway , 2003, Biological Cybernetics.

[19]  N Suga,et al.  Philosophy and stimulus design for neuroethology of complex-sound processing. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[20]  P Kuyper,et al.  Triggered correlation. , 1968, IEEE transactions on bio-medical engineering.

[21]  B. Delgutte,et al.  Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics. , 1984, The Journal of the Acoustical Society of America.

[22]  Christopher DiMattina,et al.  Virtual vocalization stimuli for investigating neural representations of species-specific vocalizations. , 2006, Journal of neurophysiology.

[23]  J. Wenstrup,et al.  Responses to Combinations of Tones in the Nuclei of the Lateral Lemniscus , 2001, Journal of the Association for Research in Otolaryngology.

[24]  M. Portnoff,et al.  Implementation of the digital phase vocoder using the fast Fourier transform , 1976 .

[25]  Corentin Dubois,et al.  Joint Detection and Tracking of Time-Varying Harmonic Components: A Flexible Bayesian Approach , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  A. Oppenheim,et al.  Signal reconstruction from phase or magnitude , 1980 .

[27]  J. Wenstrup,et al.  Excitatory and facilitatory frequency response areas in the inferior colliculus of the mustached bat , 2002, Hearing Research.

[28]  D. Margoliash Acoustic parameters underlying the responses of song-specific neurons in the white-crowned sparrow , 1983, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[29]  M M Merzenich,et al.  Representation of a species-specific vocalization in the primary auditory cortex of the common marmoset: temporal and spectral characteristics. , 1995, Journal of neurophysiology.

[30]  A. Aertsen,et al.  The Spectro-Temporal Receptive Field , 1981, Biological Cybernetics.

[31]  B. Anderson,et al.  Frequency tracking of nonsinusoidal periodic signals in noise , 1990 .

[32]  K. Sen,et al.  Feature analysis of natural sounds in the songbird auditory forebrain. , 2001, Journal of neurophysiology.

[33]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[34]  J. L. Flanagan,et al.  PHASE VOCODER , 2008 .

[35]  B. Delgutte,et al.  Speech coding in the auditory nerve: III. Voiceless fricative consonants. , 1984, The Journal of the Acoustical Society of America.

[36]  J. Eggermont Wiener and Volterra analyses applied to the auditory system , 1993, Hearing Research.

[37]  Thane Fremouw,et al.  Methods for the Analysis of Auditory Processing in the Brain , 2004, Annals of the New York Academy of Sciences.

[38]  K. Sen,et al.  Spectral-temporal Receptive Fields of Nonlinear Auditory Neurons Obtained Using Natural Sounds , 2022 .

[39]  E. de Boer,et al.  On cochlear encoding: potentialities and limitations of the reverse-correlation technique. , 1978, The Journal of the Acoustical Society of America.

[40]  Sergio M. Savaresi,et al.  On the parametrization and design of an extended Kalman filter frequency tracker , 2000, IEEE Trans. Autom. Control..

[41]  A. Aertsen,et al.  Prediction of the responses of auditory neurons in the midbrain of the grass frog based on the spectro-temporal receptive field , 1983, Hearing Research.

[42]  Brian D. O. Anderson,et al.  Conditional mean and maximum likelihood approaches to multiharmonic frequency estimation , 1994, IEEE Trans. Signal Process..

[43]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[44]  A. Doupe Song- and Order-Selective Neurons in the Songbird Anterior Forebrain and their Emergence during Vocal Development , 1997, The Journal of Neuroscience.

[45]  P. Roberts,et al.  Responses to social vocalizations in the inferior colliculus of the mustached bat are influenced by secondary tuning curves. , 2007, Journal of neurophysiology.

[46]  J. Wenstrup,et al.  Delay-tuned neurons in the inferior colliculus of the mustached bat: implications for analyses of target distance. , 1999, Journal of neurophysiology.

[47]  Barbara F. La Scala,et al.  An extended Kalman filter frequency tracker for high-noise environments , 1996, IEEE Trans. Signal Process..

[48]  J. W. Horst,et al.  Frequency discrimination of bandlimited harmonic complexes related to vowel formants , 1995 .

[49]  Dimitris G. Manolakis,et al.  Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing , 1999 .

[50]  M. S. Keshner 1/f noise , 1982, Proceedings of the IEEE.

[51]  D Margoliash,et al.  Preference for autogenous song by auditory neurons in a song system nucleus of the white-crowned sparrow , 1986, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[52]  N Suga,et al.  Harmonic-sensitive neurons in the auditory cortex of the mustache bat. , 1979, Science.