Accelerating Segment Model Decoding for LVCSR by Parallel Processing of Neighboring Segments

In human speech, most boundaries between phones/words are fuzzy. If a time slice which only includes a sole boundary is given, it is possible that the boundary may locate at any frame within the slice. Different boundary locations form several potential observation segments, which should have similar acoustic spaces because of their neighboring trait in time domain. We call them neighboring segments. In this paper, a fast algorithm of parallel processing of neighboring segments is proposed for decoding. Since the decoder can search a bigger pruning threshold in parallel processing, the proposed algorithm is faster than decoding a single segment. This algorithm is successfully integrated into a Segment Model (SM) based Mandarin Large Vocabulary Continuous Speech Recognition (LVCSR) system, and saves approximately 50% decoding time without obvious influence on the recognition accuracy.