Speaker-independent phoneme recognition on TIMIT database using integrated time-delay neural networks (TDNNs)

A structure of neural networks (NNs) is described for speaker-independent and context-independent phoneme recognition. This structure is based on the integration of time-delay neural networks (TDNN) which have several TDNNs separated according to the duration of phonemes. As a result, the proposed structure deals with phonemes of varying duration more effectively. In the experimental evaluation of the proposed structure, 16 English vowel recognition was performed using 5268 vowel tokens picked from 480 sentences spoken by 140 speakers (98 males and 42 females) on the TIMIT (TI-MIT) database. The number of training tokens and testing tokens was 4326 from 100 speakers (69 males and 31 females) and 942 from 40 speakers (29 males and 11 females), respectively. The result was a 60.5% recognition rate (around 70% for a collapsed 13-vowel case), which was improved from 56% in the single TDNN structure, showing the effectiveness of the proposed structure's use of temporal information

[1]  Michael Witbrock,et al.  A connectionist approach to continuous speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[2]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3]  James L. McClelland,et al.  James L. McClelland, David Rumelhart and the PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition . Vol. 1. Foundations . Vol. 2. Psychological and biological models . Cambridge MA: M.I.T. Press, 1987. , 1989, Journal of Child Language.

[4]  James L. McClelland Explorations In Parallel Distributed Processing , 1988 .

[5]  Victor W. Zue,et al.  Some phonetic recognition experiments using artificial neural nets , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hervé Bourlard,et al.  Speech dynamics and recurrent neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[7]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[8]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..