Minimum mean squared error time series classification using an echo state network prediction model

The echo state network (ESN) has been recently proposed as an alternative recurrent neural network model. An ESN consists of a reservoir of conventional processing elements, which are recurrently interconnected with untrained random weights, and a readout layer, which is trained using linear regression methods. The key advantage of the ESN is the ability to model systems without the need to train the recurrent weights. In this paper, we use an ESN to model the production of speech signals in a classification experiment using isolated utterances of the English digits "zero" through "nine." One prediction model for each digit was trained using frame-based speech features (cepstral coefficients) from all train utterances, and the readout layer consisted of several linear regressors which were trained to target different portions of the time series using a dynamic programming algorithm (Viterbi). Each novel test utterance was classified with the label from the digit model with the minimum mean squared prediction error. Using a corpus of 4130 isolated digits from 8 male and 8 female speakers, the highest classification accuracy attained with an ESN was 100.0% (99.1%) on the train (test) set, compared to 100% (94.7%) for a hidden Markov model (HMM). HMM performance increased to 100.0% (99.8%) when context features (first- and second-order temporal derivatives) were appended to the cepstral coefficients. The ESN offers an attractive alternative to the HMM because of the ESN's simple train procedure, low computational requirements, and inherent ability to model the dynamics of the signal under study

[1]  DeLiang Wang,et al.  Locally excitatory globally inhibitory oscillator networks , 1995, IEEE Transactions on Neural Networks.

[2]  G. McLachlan,et al.  Pattern Classification: A Unified View of Statistical and Neural Approaches. , 1998 .

[3]  Jürgen Schürmann,et al.  Pattern classification , 2008 .

[4]  G. V. Puskorius,et al.  A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification , 1998, Proc. IEEE.

[5]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[6]  T. W. Frison,et al.  Obtaining order in a world of chaos [signal processing] , 1998, IEEE Signal Process. Mag..

[7]  Michael J. Korenberg,et al.  Parallel cascade identification and kernel estimation for nonlinear systems , 2006, Annals of Biomedical Engineering.

[8]  Herbert Jaeger,et al.  Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[9]  W. Freeman,et al.  How brains make chaos in order to make sense of the world , 1987, Behavioral and Brain Sciences.

[10]  Mark D Skowronski,et al.  Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition. , 2004, The Journal of the Acoustical Society of America.

[11]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Amir F. Atiya,et al.  New results on recurrent network training: unifying the algorithms and accelerating convergence , 2000, IEEE Trans. Neural Networks Learn. Syst..

[13]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[14]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[15]  Simon Haykin,et al.  A Signal Processing Framework Based on Dynamic Neural Networks with Application to Problems in Adaptation, Filtering and Classification , 2001 .

[16]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  Ken-ichi Iso,et al.  Speaker-independent word recognition using a neural prediction model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[19]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[20]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[21]  Frank C. Hoppensteadt,et al.  Pattern recognition via synchronization in phase-locked loop neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[22]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.