Phoneme Recognition with Large Hierarchical Reservoirs

Automatic speech recognition has gradually improved over the years, but the reliable recognition of unconstrained speech is still not within reach. In order to achieve a breakthrough, many research groups are now investigating new methodologies that have potential to outperform the Hidden Markov Model technology that is at the core of all present commercial systems. In this paper, it is shown that the recently introduced concept of Reservoir Computing might form the basis of such a methodology. In a limited amount of time, a reservoir system that can recognize the elementary sounds of continuous speech has been built. The system already achieves a state-of-the-art performance, and there is evidence that the margin for further improvements is still significant.

[1]  Francis Jack Smith,et al.  Improved phone recognition using Bayesian triphone models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[3]  Helmut Hauser,et al.  Echo state networks with filter neurons and a delay&sum readout , 2010, Neural Networks.

[4]  John G. Harris,et al.  Automatic speech recognition using a predictive echo state network classifier , 2007, Neural Networks.

[5]  Geoffrey E. Hinton,et al.  Deep Belief Networks for phone recognition , 2009 .

[6]  Benjamin Schrauwen,et al.  Generative Modeling of Autonomous Robots and their Environments using Reservoir Computing , 2007, Neural Processing Letters.

[7]  Herbert Jaeger,et al.  Optimization and applications of echo state networks with leaky- integrator neurons , 2007, Neural Networks.

[8]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[9]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Benjamin Schrauwen,et al.  An experimental unification of reservoir computing methods , 2007, Neural Networks.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Patrick Wambacq,et al.  SPRAAK: an open source "SPeech recognition and automatic annotation kit" , 2008, INTERSPEECH.

[13]  Benjamin Schrauwen,et al.  On Computational Power and the Order-Chaos Phase Transition in Reservoir Computing , 2008, NIPS.

[14]  Benjamin Schrauwen,et al.  A hierarchy of recurrent networks for speech recognition , 2009, NIPS 2009.

[15]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[16]  Benjamin Schrauwen,et al.  Isolated word recognition using a Liquid State Machine , 2005, ESANN.

[17]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  Hervé Bourlard,et al.  Continuous speech recognition by connectionist statistical methods , 1993, IEEE Trans. Neural Networks.

[19]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[20]  J P Martens,et al.  Pitch and voiced/unvoiced determination with an auditory model. , 1992, The Journal of the Acoustical Society of America.