Posterior probability based indexing method for Chinese spoken document retrieval

Syllable lattice based Chinese speech retrieval methods can avoid the problem of out of vocabulary (OOV) words and compensate the retrieval performance loss resulted by recognition error. For absence of effective indexing method in lattice based retrieval approaches,a posterior probability based indexing method is proposed in this paper,which introduces syllables and K step neighbor syllable pairs as index items and takes the posterior probability as weighted value for an improved vector space model. It is proven by a series of retrieval experiments that our method is more suitable for lattice based spoken document retrieval tasks and the improvement accomplishes its anticipated purposes.