Automatic speech recognition with neural networks: Beyond nonparametric models

In the last few years different connectionist models have been applied to many perceptual tasks. Many efforts have been focussed in particular to different speech recognition tasks in the attempt of exploring the remarkable potential learning capabilities of connectionist models. In this paper we briefly review most successful approaches to speech recognition in the attempt of assessing their actual contribution to the field. A detailed analysis of different problems found in speech recognition allows us to identify some “desiderata” to be met for building challenging models. One of the most remarkable targets is that of proposing an effective model of the speech time dimension. Moreover, many proposed connectionist models turn out to be severely limited by their inherent nonparametric structure which makes learning of many tasks very hard. We suggest methods for introducing prior knowledge in recurrent networks and briefly discuss how can they learn more effectively in presence of “structured tasks”.

[1]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[2]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3]  Giovanni Soda,et al.  An unified approach for integrating explicit knowledge and learning by example in recurrent networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[4]  Anthony J. Robinson,et al.  Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[5]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[7]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[8]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[9]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Yoshua Bengio,et al.  Learning the dynamic nature of speech with back-propagation for sequences , 1992, Pattern Recognit. Lett..

[12]  Lokendra Shastri,et al.  Speech recognition using connectionist networks , 1988 .

[13]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[14]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[15]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[16]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[17]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.