Efficient Coding of Time-Relative Structure Using Spikes

Nonstationary acoustic features provide essential cues for many auditory tasks, including sound localization, auditory stream analysis, and speech recognition. These features can best be characterized relative to a precise point in time, such as the onset of a sound or the beginning of a harmonic periodicity. Extracting these types of features is a difficult problem. Part of the difficulty is that with standard block-based signal analysis methods, the representation is sensitive to the arbitrary alignment of the blocks with respect to the signal. Convolutional techniques such as shift-invariant transformations can reduce this sensitivity, but these do not yield a code that is efficient, that is, one that forms a nonredundant representation of the underlying structure. Here, we develop a non-block-based method for signal representation that is both time relative and efficient. Signals are represented using a linear superposition of time-shiftable kernel functions, each with an associated magnitude and temporal position. Signal decomposition in this method is a non-linear process that consists of optimizing the kernel function scaling coefficients and temporal positions to form an efficient, shift-invariant representation. We demonstrate the properties of this representation for the purpose of characterizing structure in various types of nonstationary acoustic signals. The computational problem investigated here has direct relevance to the neural coding at the auditory nerve and the more general issue of how to encode complex, time-varying signals with a population of spiking neurons.

[1]  Bruno A. Olshausen,et al.  Sparse Codes and Spikes , 2001 .

[2]  Steven Greenberg,et al.  A Composite Model of the Auditory Periphery for the Processing of Speech (Invited) , 1988 .

[3]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[4]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[5]  Michael S. Lewicki,et al.  Efficient coding of natural sounds , 2002, Nature Neuroscience.

[6]  Terrence J. Sejnowski,et al.  Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations , 1998, NIPS.

[7]  Xavier Rodet,et al.  Sound Signals Decomposition Using a High Resolution Matching Pursuit , 1996, ICMC.

[8]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[9]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[10]  Richard F. Lyon,et al.  A computational model of filtering, detection, and compression in the cochlea , 1982, ICASSP.

[11]  D. Oertel The role of timing in the brain stem auditory nuclei of vertebrates. , 1999, Annual review of physiology.

[12]  Mike E. Davies,et al.  Monte Carlo Methods for Adaptive Sparse Approximations of Time-Series , 2007, IEEE Transactions on Signal Processing.

[13]  Richard F. Lyon,et al.  On the importance of time—a temporal representation of sound , 1993 .

[14]  J. Moake,et al.  This article has been cited by other articles , 2003 .

[15]  S. Shamma Speech processing in the auditory system. II: Lateral inhibition and the central processing of speech evoked activity in the auditory nerve. , 1985, The Journal of the Acoustical Society of America.

[16]  Ieee Lawrence R. Rabiner Fellow,et al.  Isolated and Connected Word Recognition—Theory and Selected Applications , 1990 .

[17]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[18]  Bruno A. Olshausen,et al.  PROBABILISTIC FRAMEWORK FOR THE ADAPTATION AND COMPARISON OF IMAGE CODES , 1999 .

[19]  Oded Ghitza,et al.  Temporal non-place information in the auditory-nerve firing patterns as a front-end for speech recognition in a noisy environment , 1988 .

[20]  Frank Baumgarte,et al.  Improved audio coding using a psychoacoustic model based on a cochlear filter bank , 2002, IEEE Trans. Speech Audio Process..

[21]  Martin Vetterli,et al.  Atomic signal models based on recursive filter banks , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[22]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[23]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[24]  A. Liberman,et al.  Some Cues for the Distinction Between Voiced and Voiceless Stops in Initial Position , 1957 .

[25]  W. S. Rhode,et al.  A composite model of the auditory periphery for the processing of speech based on the filter response functions of single auditory-nerve fibers. , 1991, The Journal of the Acoustical Society of America.