Improved spoken term detection by discriminative training of acoustic models based on user relevance feedback

In a previous paper [1], we proposed a new framework for spoken term detection by exploiting user relevance feedback information to estimate better acoustic model parameters to be used in rescoring the spoken segments. In this way, the acoustic models can be trained with a criterion of better retrieval performance, and the retrieval performance can be less dependent on the existence of a set of acoustic models well matched to the corpora to be retrieved. In this paper, a new set of objective functions for acoustic model training in the above framework was proposed considering the nature of retrieval process and its performance measure, and discriminative training algorithms maximizing the objective functions were developed. Significant performance improvements were obtained in preliminary experiments.

[1]  Sridha Sridharan,et al.  Optimising Figure of Merit for phonetic spoken term detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Simon King,et al.  Term-dependent confidence for out-of-vocabulary term detection , 2009, INTERSPEECH.

[3]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[4]  Lin-Shan Lee,et al.  Integrating recognition and retrieval with user feedback: A new framework for spoken term detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Masataka Goto,et al.  Podcastle: collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription , 2009, INTERSPEECH.

[7]  Peng Yu,et al.  Towards Spoken-Document Retrieval for the Internet: Lattice Indexing For Large-Scale Web-Search Architectures , 2006, NAACL.

[8]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[9]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[10]  Bhuvana Ramabhadran,et al.  Balancing false alarms and hits in Spoken Term Detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Xuehua Shen,et al.  Context-sensitive information retrieval using implicit feedback , 2005, SIGIR '05.

[12]  Timothy J. Hazen,et al.  Query-by-example spoken term detection using phonetic posteriorgram templates , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.