In human speech, most boundaries between phones/words are fuzzy. If a time slice which only includes a sole boundary is given, it is possible that the boundary may locate at any frame within the slice. Different boundary locations form several potential observation segments, which should have similar acoustic spaces because of their neighboring trait in time domain. We call them neighboring segments. In this paper, a fast algorithm of parallel processing of neighboring segments is proposed for decoding. Since the decoder can search a bigger pruning threshold in parallel processing, the proposed algorithm is faster than decoding a single segment. This algorithm is successfully integrated into a Segment Model (SM) based Mandarin Large Vocabulary Continuous Speech Recognition (LVCSR) system, and saves approximately 50% decoding time without obvious influence on the recognition accuracy.
[1]
Mari Ostendorf,et al.
A stochastic segment model for phoneme-based continuous speech recognition
,
1989,
IEEE Trans. Acoust. Speech Signal Process..
[2]
Bo Xu,et al.
A Fast Framework for the Constrained Mean Trajectory Segment Model by Avoidance of Redundant Computation on Segment
,
2006,
ROCLING/IJCLCLP.
[3]
Mari Ostendorf,et al.
From HMM's to segment models: a unified view of stochastic modeling for speech recognition
,
1996,
IEEE Trans. Speech Audio Process..
[4]
Mari Ostendorf,et al.
Fast algorithms for phone classification and recognition using segment-based models
,
1992,
IEEE Trans. Signal Process..