Spotting Japanese CV-syllables and phonemes using the time-delay neural networks

The authors present techniques for spotting Japanese CV syllables/phonemes in input speech based on TDNNs. They constructed a TDNN which can discriminate a single CV syllable or phoneme group. In Japanese, there are only about one hundred syllables, or fewer than 30 phonemes, which makes it feasible to prepare and train the TDNN to spot all possible syllables or phonemes extracted as training tokens from training words. Syllable and phoneme spotting experiments show excellent results, including a syllable spotting rate of better than 96.7% correct. These spotting techniques are proved to be a significant step toward continuous speech recognition.<<ETX>>

[1]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[2]  Sharad Singhal,et al.  Using an adaptive network to recognize demisyllables in continuous speech , 1988 .

[3]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[4]  Alex Waibel,et al.  Phoneme Recognition: Neural Networks vs , 1988 .

[5]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..