A monolithic speech recognizer based on fully recurrent neural networks

Reports on investigations concerning the application of fully recurrent neural networks (FRNN) for speaker independent speech recognition. In a phoneme based recognition system separate FRNN are used for feature scoring as well as for compensating variations in time durations of speech segments. A recognizer with a FRNN for feature scoring achieves the same recognition rate as a recognition system where the context information is provided. The performance of the FRNN used for time alignment is comparable to that of a viterbi based alignment with durational constraints. Additionally, a monolithic speech recognizer is realized by FRNN which directly classifies feature sequences. The performance of this FRNN is comparable to that of speech recognition systems which are based on discrete hidden Markov models and use a sophisticated durational modeling. Furthermore, simulation experiments revealed that FRNN are able to extract relevant information for speech recognition from noise contaminated speech and thus achieve a robust recognition performance.<<ETX>>