A segment-based wordspotter using phonetic filler models

A common approach to wordspotting is to augment the keyword models with "filler" models to account for nonkeyword intervals. An alternative approach is to use a large vocabulary continuous speech recognition system (LVCSR) to produce a word string, and then search for the keywords in that string. While the latter approach typically yields higher performance, it requires costly computation and extensive training data. We develop several segment-based wordspotters in an effort to achieve performance comparable to that of the LVCSR spotter, but with only a fraction of the vocabulary. We investigate several methods to model the background, ranging from a few general models to refined phone representations. The task is to detect sixty-one keywords from continuous speech in the ATIS domain. The best performance we achieve is 91.4% figure of merit for the LVCSR spotter and 86.7% for a spotter using 57 phone-based filler models.