Nonlinear enhancement of noisy speech, using continuous attractor dynamics formed in recurrent neural networks

Here, formation of continuous attractor dynamics in a nonlinear recurrent neural network is used to achieve a nonlinear speech denoising method, in order to implement robust phoneme recognition and information retrieval. Formation of attractor dynamics in recurrent neural network is first carried out by training the clean speech subspace as the continuous attractor. Then, it is used to recognize noisy speech with both stationary and nonstationary noise. In this work, the efficiency of a nonlinear feedforward network is compared to the same one with a recurrent connection in its hidden layer. The structure and training of this recurrent connection, is designed in such a way that the network learns to denoise the signal step by step, using properties of attractors it has formed, along with phone recognition. Using these connections, the recognition accuracy is improved 21% for the stationary signal and 14% for the nonstationary one with 0db SNR, in respect to a reference model which is a feedforward neural network.

[1]  Sadaoki Furui,et al.  Toward Robust Speech Recognition and Understanding , 2003, J. VLSI Signal Process..

[2]  Seyyed Ali Seyyedsalehi,et al.  Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks , 2009, Neural Computing and Applications.

[3]  Phil D. Green,et al.  Speech enhancement with missing data techniques using recurrent neural networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[6]  Jenq-Neng Hwang,et al.  Coordinated training of noise removing networks , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Sadaoki Furui,et al.  Robust methods in automatic speech recognition and understanding , 2003, INTERSPEECH.

[8]  Gérard Faucon,et al.  Noise reduction for speech enhancement in cars: non-linear spectral subtraction / kalman filtering , 1991, EUROSPEECH.

[9]  Sridha Sridharan,et al.  Improving The Effectiveness of Existing Noise Reduction Techniques Using Neural Networks , 1996, Fourth International Symposium on Signal Processing and Its Applications.

[10]  Richard Lippmann,et al.  Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise KN-37 , 1997, EUROSPEECH.

[11]  C Koch,et al.  Analog "neuronal" networks in early vision. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Torbjørn Eltoft,et al.  ICA AND NONLINEAR TIME SERIES PREDICTION FOR RECOVERING MISSING DATA SEGMENTS IN MULTIVARIATE SIGNALS , 2001 .

[13]  Joachim Selbig,et al.  Non-linear PCA: a missing data approach , 2005, Bioinform..

[14]  Phil D. Green,et al.  Speech Recognition with Missing Data using Recurrent Neural Nets , 2001, NIPS.

[15]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[16]  G. A. Miller,et al.  The Intelligibility of Interrupted Speech , 1948 .

[17]  Yoshua Bengio,et al.  Recurrent Neural Networks for Missing or Asynchronous Data , 1995, NIPS.

[18]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[19]  Yariv Ephraim,et al.  Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[20]  Thomas Eriksson,et al.  Interpolating the history improved excitation coding for high quality CELP coding , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21]  M Bijankhan,et al.  FARSDAT- THE SPEECH DATABASE OF FARSI SPOKEN LANGUAGE , 1994 .

[22]  Michael I. Jordan,et al.  Attractor Dynamics in Feedforward Neural Networks , 2000, Neural Computation.

[23]  Mahesan Niranjan,et al.  Speech enhancement in a Bayesian framework , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[25]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.