Modulation analysis of speech through orthogonal FIR filterbank optimization

Newborns must learn to structure incoming acoustic information into segments, words, phrases, etc., before they can start to learn language. This process is thought to rely on modulation structure of the speech waveform induced by segmental or prosodic regularities within the speech heard by the infant. Here, we investigate the process by which the initial acoustic processing required by modulation analysis can itself be tuned by exposure to the regularities of speech. Starting from the classic definition of modulation, as applied within channels of the peripheral filter, we formulate a mathematical framework in which the structure of initial spectral filtering is adapted for modulation analysis. Our working hypothesis is that the human ear and brain are adapted to the analysis of modulation, via a data-driven learning process on the scale of development (or possibly evolution). Simulation results are presented and a comparison with filterbanks classically used in signal processing is done.

[1]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[2]  Sharon Peperkamp,et al.  Discovering words in the continuous speech stream: the role of prosody , 2003, J. Phonetics.

[3]  B. Kollmeier,et al.  A neural circuit transforming temporal periodicity information into a rate-based representation in the mammalian auditory system. , 2007, The Journal of the Acoustical Society of America.

[4]  J. Fritz,et al.  Dynamics of Precise Spike Timing in Primary Auditory Cortex , 2004, The Journal of Neuroscience.

[5]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  F. Zeng,et al.  Speech recognition with altered spectral distribution of envelope cues. , 1996, The Journal of the Acoustical Society of America.

[7]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.

[8]  Hynek Hermansky,et al.  Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.

[9]  Hynek Hermansky TRAP-TANDEM: data-driven extraction of temporal features from speech , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[10]  Sun-Yuan Kung,et al.  Gradient Adaptive Paraunitary Filter Banks for Spatio-Temporal Subspace Analysis and Multichannel Blind Deconvolution , 2004, J. VLSI Signal Process..

[11]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[12]  Climent Nadeu,et al.  Optimization algorithms for estimating modulation spectrum domain filters , 1999, EUROSPEECH.

[13]  Brian C J Moore,et al.  Speech perception problems of the hearing impaired reflect inability to use temporal fine structure , 2006, Proceedings of the National Academy of Sciences.