We present the concept of a "Segmental Neural Net" (SNN) for phonetic modeling in continuous speech recognition. The SNN takes as input all the frames of a phonetic segment and gives as output an estimate of the probability of each of the phonemes, given the input segment. By taking into account all the frames of a phonetic segment simultaneously, the SNN overcomes the well-known conditional-independence limitation of hidden Markov models (HMM). However, the problem of automatic segmentation with neural nets is a formidable computing task compared to HMMs. Therefore, to take advantage of the training and decoding speed of HMMs, we have developed a novel hybrid SNN/HMM system that combines the advantages of both types of approaches. In this hybrid system, use is made of the N-best paradigm to generate likely phonetic segmentations, which are then scored by the SNN. The HMM and SNN scores are then combined to optimize performance. In this manner, the recognition accuracy is guaranteed to be no worse than the HMM system alone.
[1]
Mari Ostendorf,et al.
A stochastic segment model for phoneme-based continuous speech recognition
,
1989,
IEEE Trans. Acoust. Speech Signal Process..
[2]
Mari Ostendorf,et al.
Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses
,
1991,
HLT.
[3]
H. Gish,et al.
A probabilistic approach to the understanding and training of neural network classifiers
,
1990,
International Conference on Acoustics, Speech, and Signal Processing.
[4]
Richard M. Schwartz,et al.
The N-Best Algorithm: Efficient Procedure for Finding Top N Sentence Hypotheses
,
1989,
HLT.
[5]
Richard M. Schwartz,et al.
Toward a Real-Time Spoken Language System Using Commercial Hardware
,
1990,
HLT.
[6]
Amro El-Jaroudi,et al.
A new error criterion for posterior probability estimation with neural nets
,
1990,
1990 IJCNN International Joint Conference on Neural Networks.