Topic dependent language modelling for spoken term detection

This paper investigates the effect of topic dependent language models (TDLM) on phonetic spoken term detection (STD) using dynamic match lattice spotting (DMLS). Phonetic STD consists of two steps: indexing and search. The accuracy of indexing audio segments into phone sequences using phone recognition methods directly affects the accuracy of the final STD system. If the topic of a document in known, recognizing the spoken words and indexing them to an intermediate representation is an easier task and consequently, detecting a search word in it will be more accurate and robust. In this paper, we propose the use of TDLMs in the indexing stage to improve the accuracy of STD in situations where the topic of the audio document is known in advance. It is shown that using TDLMs instead of the traditional general language model (GLM) improves STD performance according to figure of merit (FOM) criteria.

[1]  Sridha Sridharan,et al.  A phonetic search approach to the 2006 NIST spoken term detection evaluation , 2007, INTERSPEECH.

[2]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[3]  Feifan Liu,et al.  Unsupervised language model adaptation via topic modeling based on named entity hypotheses , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Xiaodong Cui,et al.  An empirical study of confusion modeling in keyword search for low resource languages , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[5]  Sridha Sridharan,et al.  Spoken term detection using fast phonetic decoding , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[7]  Georges Linarès,et al.  Combining acoustic name spotting and continuous context models to improve spoken person name recognition in speech , 2013, INTERSPEECH.

[8]  A. Akbari,et al.  Improved dynamic match phone lattice search using Viterbi scores and Jaro Winkler distance for keyword spotting system , 2012, The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012).

[9]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10]  Sridha Sridharan,et al.  The effect of language models on phonetic decoding for spoken term detection , 2009, SSCS '09.

[11]  Bin Ma,et al.  Discriminative learning for optimizing detection performance in spoken language recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Masatoshi Tsuchiya,et al.  Topic dependent class based language model evaluation on automatic speech recognition , 2010, 2010 IEEE Spoken Language Technology Workshop.

[13]  Sridha Sridharan,et al.  Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures vs. dynamic cache models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  Sridha Sridharan,et al.  Optimising Figure of Merit for phonetic spoken term detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Masatoshi Tsuchiya,et al.  Topic-Dependent-Class-Based $n$-Gram Language Model , 2012, IEEE Transactions on Audio, Speech, and Language Processing.