An initial attempt to improve spoken term detection by learning optimal weights for different indexing features

Because different indexing features actually have different discriminative capabilities for spoken term detection and different levels of reliability in recognition, it is reasonable to weight the indexing features in the transcribed lattices differently during spoken term detection. In this paper, we present an initial attempt of using two weighting schemes, one context independent (fixed weight for each feature) and one context dependent(different weights for the same feature in different context). These weights can be learned by optimizing a desired spoken term detection performance measure over a training document set and a training query set. Encouraging initial results based on unigrams of Chinese characters and syllables for the corpus of Mandarin broadcast news were obtained from the preliminary experiments.

[1]  Richard Zens,et al.  Speech Translation by Confusion Network Decoding , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Yu Shi,et al.  Approximateword-lattice indexing with text indexers: Time-Anchored Lattice Expansion , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[4]  Alex Acero,et al.  Position Specific Posterior Lattices for Indexing Speech , 2005, ACL.

[5]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[6]  Mikko Kurimo,et al.  Indexing confusion networks for morph-based spoken document retrieval , 2007, SIGIR.

[7]  Kenji Iwata,et al.  Robust spoken term detection using combination of phone-based and word-based recognition , 2008, INTERSPEECH.

[8]  Bhuvana Ramabhadran,et al.  Fast audio search using vector space modelling , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[9]  Lin-Shan Lee,et al.  Improved lattice-based spoken document retrieval by directly learning from the evaluation measures , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Simon King,et al.  Term-dependent confidence for out-of-vocabulary term detection , 2009, INTERSPEECH.

[11]  Dong Wang,et al.  A posterior probability-based system hybridisation and combination for spoken term detection , 2009, INTERSPEECH.

[12]  Ville T. Turunen Reducing the effect of OOV query words by using morph-based spoken document retrieval , 2008, INTERSPEECH.

[13]  J. Scott Olsson,et al.  Fast Unconstrained Audio Search in Numerous Human Languages , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  Herbert Gish,et al.  Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[15]  J. Scott Olsson Vocabulary independent discriminative term frequency estimation , 2008, INTERSPEECH.

[16]  David Carmel,et al.  Spoken document retrieval from call-center conversations , 2006, SIGIR.