Generalized fast on-the-fly composition algorithm for WFST-based speech recognition

This paper describes a Generalized Fast On-the-fly Composition (GFOC) algorithm for Weighted Finite-State Transducers (WFSTs) in speech recognition. We already proposed the original version of GFOC, which yields fast and memory-efficient decoding using two WFSTs. GFOC enables fast on-the-fly composition of three or more WFSTs during decoding. In many cases, it is actually difficult or impossible to organize an entire transduction process of speech recognition using only one or two WFST(s) since some types of models considerably enlarge after written in WFST form. For example, a class n-gram model often results in a large WFST which is several times larger than a word n-gram model for the same vocabulary. GFOC makes it possible to use such a model after decomposing it into small multiple WFSTs. In a spontaneous speech transcription task, we evaluated the size of WFSTs, decoding speed, and word accuracy of several decoding approaches. The results show that GFOC with three or more WFSTs is an efficient algorithm when using a class-based language model.