Unconstrained keyword spotting using phone lattices with application to spoken document retrieval

Abstract Traditional hidden Markov model (HMM) word spotting requires both explicit HMM models of each desired keyword and a computationally expensive decoding pass. For certain applications, such as audio indexing or information retrieval, conventional word spotting may be too constrained or impractically slow. This paper presents an alternative technique, where a phone lattice—representing multiple phone hypotheses—is pre-computed prior to need. Given a phone decomposition of any desired keyword, the lattice may be rapidly searched to find putative occurrences of the keyword. Though somewhat less accurate, this can be substantially faster (orders of magnitude) and more flexible (any keyword may be detected) than previous approaches. This paper presents algorithms for lattice generation and scanning, and experimental results, including comparison with conventional keyword-HMM approaches. Finally, word spotting based on phone lattice scanning is demonstrated to be effective for spoken document retrieval.