Nonlinear Prediction of Speech by Echo State Networks

Speech prediction plays a key role in many speech signal processing and speech communication methods. While linear prediction of speech is well-studied, nonlinear speech prediction increasingly receives interest especially with the vast amount of new neural network topologies proposed recently. In this paper, nonlinear speech prediction is conducted by a special kind of recurrent neural network not requiring any training beforehand, the echo state network, which adaptively updates its output layer weights. Simulations show its superior performance compared to other well-known prediction approaches in terms of the prediction gain, exceeding all baselines in all conditions by up to 8 dB.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  James Durbin,et al.  The fitting of time series models , 1960 .

[3]  Tim Fingscheidt,et al.  Delayless soft-decision decoding of high-quality audio transmitted over awgn channels , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  J. Nagumo,et al.  A learning method for system identification , 1967, IEEE Transactions on Automatic Control.

[5]  Liang Li,et al.  Nonlinear adaptive prediction of nonstationary signals , 1995, IEEE Trans. Signal Process..

[6]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[7]  Dennis Gabor,et al.  Non-linear filter, predictor and simulator , 1961 .

[8]  Tim Fingscheidt,et al.  Robust Ultra-Low Latency Soft-Decision Decoding of Linear PCM Audio , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Biing-Hwang Juang,et al.  Speech enhancement using minimum mean-square error estimation and a post-filter derived from vector quantization of clean speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Ioannis B. Theocharis,et al.  A recurrent fuzzy neural network for adaptive speech prediction , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[11]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[12]  James R. Zeidler,et al.  Adaptive tracking of linear time-variant systems by extended RLS algorithms , 1997, IEEE Trans. Signal Process..

[13]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[14]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[15]  Wouter Tirry,et al.  Instantaneous A Priori SNR Estimation by Cepstral Excitation Manipulation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Patrick Bauer,et al.  HMM-based artificial bandwidth extension supported by neural networks , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[17]  Rozaida Ghazali,et al.  Speech prediction using higher order neural networks , 2009, 2009 International Conference on Innovations in Information Technology (IIT).

[18]  José Carlos Príncipe,et al.  Analysis and Design of Echo State Networks , 2007, Neural Computation.

[19]  C. N. Manikopoulos,et al.  Neural net nonlinear prediction for speech data , 1991 .

[20]  Peter Vary,et al.  A candidate proposal for a 3GPP adaptive multi-rate wideband speech codec , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[21]  Herbert Jaeger,et al.  A tutorial on training recurrent neural networks , covering BPPT , RTRL , EKF and the " echo state network " approach - Semantic Scholar , 2005 .

[22]  E. Hänsler,et al.  Acoustic Echo and Noise Control: A Practical Approach , 2004 .

[23]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[24]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[25]  Herbert Jaeger,et al.  Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[26]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[27]  Peter No,et al.  Digital Coding of Waveforms , 1986 .

[28]  N. Levinson The Wiener (Root Mean Square) Error Criterion in Filter Design and Prediction , 1946 .

[29]  Naftali Tishby,et al.  A dynamical systems approach to speech processing , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[30]  K. H. Barratt Digital Coding of Waveforms , 1985 .

[31]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[32]  Ioannis B. Theocharis,et al.  Pipelined Recurrent Fuzzy Neural Networks for Nonlinear Adaptive Speech Prediction , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[33]  Wouter Tirry,et al.  Two-stage speech enhancement with manipulation of the cepstral excitation , 2017, 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA).

[34]  Danilo P. Mandic,et al.  Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability , 2001 .

[35]  Simon Haykin,et al.  Adaptive Filter Theory 4th Edition , 2002 .

[36]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[37]  Jonathon A. Chambers,et al.  Nonlinear adaptive prediction of speech with a pipelined recurrent neural network , 1998, IEEE Trans. Signal Process..