论文信息 - Online phoneme recognition using multi-layer perceptron networks combined with recurrent non-linear autoregressive neural networks with exogenous inputs

Online phoneme recognition using multi-layer perceptron networks combined with recurrent non-linear autoregressive neural networks with exogenous inputs

Abstract Off-line pattern recognition in speech signals is a complex task. Yet, this task becomes harder when the recognition result is required online or in real-time. The present work proposes an online identification of the Portuguese language phonemes using a non-linear autoregressive model with exogenous inputs, commonly called NARX. The process first conditions the input speech signal, and extracts its frequency characteristics. Then it pre-classifies the extracted features into one of the ten possible groups of phonemes, as available in the Portuguese language. This pre-classification is done using a multilayer perceptron network (MLP) with a supervised learning. Subsequently, the MLP output vector, together with the vector that carries the input frequencies, feeds a NARX neural network by means of a temporal delay of four times and feed-backward recurrent links that encompass the results of all hidden layers of the network. As a result of this process, the proposed phoneme recognition process improves the accuracy of an online identification of the Portuguese spoken phonemes during a natural conversation. When the phoneme input signal is well conditioned and continuous over time, the proposed recognition process can provide the correct classification in real-time, with an acceptable accuracy rate.

Nadia Nedjah | Luiza de Macedo Mourelle | L. M. Mourelle | Diana A. Bonilla | N. Nedjah

[1] Huaguang Zhang,et al. Novel Weighting-Delay-Based Stability Criteria for Recurrent Neural Networks With Time-Varying Delay , 2010, IEEE Transactions on Neural Networks.

[2] Huaguang Zhang,et al. A Comprehensive Review of Stability Analysis of Continuous-Time Recurrent Neural Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3] Simon Haykin,et al. Neural Networks and Learning Machines , 2010 .

[4] Bhiksha Raj,et al. The Basics of Automatic Speech Recognition , 2012, Techniques for Noise Robustness in Automatic Speech Recognition.

[5] Richard Lippmann,et al. Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[6] Jinde Cao,et al. Parameter identification of dynamical systems from time series. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .

[8] Meinard Müller,et al. Dynamic Time Warping , 2008 .

[9] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10] Mark D. Hanes,et al. Acoustic-to-phonetic mapping using recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[11] Chin-Hui Lee,et al. An artificial neural network approach to automatic speech processing , 2014, Neurocomputing.

[12] Jinde Cao,et al. Synchronization-based approach for parameters identification in delayed chaotic neural networks , 2007 .

[13] Chin-Hui Lee,et al. Exploiting deep neural networks for detection-based speech recognition , 2013, Neurocomputing.

[14] Alex Acero,et al. Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[15] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[16] Peter Tiño,et al. Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[17] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[18] John H. L. Hansen,et al. Discrete-Time Processing of Speech Signals , 1993 .

[19] Huaguang Zhang,et al. Design and analysis of associative memories based on external inputs of delayed recurrent neural networks , 2014, Neurocomputing.