Exploration of rank order coding with spiking neural networks for speech recognition

Speech recognition is very difficult in the context of noisy and corrupted speech. Most conventional techniques need huge databases to estimate speech (or noise) density probabilities to perform recognition. We discuss the potential of perceptive speech analysis and processing in combination with biologically plausible neural network processors. We illustrate the potential of such non-linear processing of speech by means of a preliminary test with recognition of French spoken digits from a small speech database

[1]  J. Fritz,et al.  Dynamics of Precise Spike Timing in Primary Auditory Cortex , 2004, The Journal of Neuroscience.

[2]  K. Jarrod Millman,et al.  Learning Sparse Codes with a Mixture-of-Gaussians Prior , 1999, NIPS.

[3]  Rajesh P. N. Rao,et al.  Probabilistic Models of the Brain: Perception and Neural Function , 2002 .

[4]  S. Thorpe,et al.  Spike times make sense , 2005, Trends in Neurosciences.

[5]  Melvyn J. Hunt,et al.  Spectral Signal Processing for ASR , 2007 .

[6]  R. Patterson Auditory filter shapes derived with noise stimuli. , 1976, The Journal of the Acoustical Society of America.

[7]  Oded Ghitza,et al.  A comparative study of mel cepstra and EIH for phone classification under adverse conditions , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Arnaud Delorme,et al.  Networks of integrate-and-fire neurons using Rank Order Coding B: Spike timing dependent plasticity and emergence of orientation selectivity , 2001, Neurocomputing.

[9]  Oded Ghitza,et al.  Auditory models and human performance in tasks related to speech coding and speech recognition , 1994, IEEE Trans. Speech Audio Process..

[10]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[11]  Arnaud Delorme,et al.  Spike-based strategies for rapid processing , 2001, Neural Networks.

[12]  C. Lefebvre,et al.  A comparison of several acoustic representations for speech recognition with degraded and undegraded speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[13]  Gernot Kubin,et al.  How sparse can we make the auditory representation of speech? , 2004, INTERSPEECH.

[14]  Stefan Wermter,et al.  Spike-timing-dependent synaptic plasticity: from single spikes to spike trains , 2004, Neurocomputing.

[15]  Larry S. Davis,et al.  Pitch and timbre manipulations using cortical representation of sound , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  S. Thorpe,et al.  Surfing a spike wave down the ventral stream , 2002, Vision Research.

[18]  D. Irvine,et al.  First-spike timing of auditory-nerve fibers and comparison with auditory cortex. , 1997, Journal of neurophysiology.

[19]  Laurent Perrinet,et al.  Comment déchiffrer le code impulsionnel de la Vision? Étude du flux parallèle, asynchrone et épars dans le traitement visuel ultra-rapide. , 2003 .