Improved dynamic match phone lattice search using Viterbi scores and Jaro Winkler distance for keyword spotting system

Keyword spotting (KWS) refers to finding of all occurrences of the chosen words in speech utterances. One of known methods for KWS problem is phone lattice search (PLS). In this method, accuracy and speed of lattice search are most important aspects. One method used in PLS, is Minimum Edit Distance (MED) measure. While this measure increases detection rate, it also raises the false alarm rate. In this paper, we propose some approaches to improve the false alarm rate and also the search speed. We use Viterbi scores and Jaro-Winkler distance besides MED measure in order to decrease the false alarm rate. We also use lattice pruning and indexing techniques for increasing the speed of search on the lattice. Results show that proposed method increases accuracy and search speed of KWS system in comparison to using only MED measure.

[1]  Ashish Verma,et al.  Keyword Search using Modified Minimum Edit Distance Measure , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Pavel Matejka,et al.  Search in Speech for Public Security and Defense , 2007 .

[3]  Sridha Sridharan,et al.  Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Mahmood Bijankhan,et al.  Tfarsdat - the telephone farsi speech database , 2003, INTERSPEECH.

[5]  Sridha Sridharan,et al.  Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Karen Spärck Jones,et al.  Unconstrained keyword spotting using phone lattices with application to spoken document retrieval , 1997, Comput. Speech Lang..

[7]  Farshad Almasganj,et al.  Hybrid statistical pronunciation models designed to be trained by a medium-size corpus , 2009, Comput. Speech Lang..

[8]  Lukás Burget,et al.  Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[9]  Bin Ma,et al.  Phoneme lattice based texttiling towards multilingual story segmentation , 2010, INTERSPEECH.

[10]  Duc Duong,et al.  An empirical study of multipass decoding for vietnamese LVCSR , 2008, SLTU.

[11]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[12]  Murat Saraclar,et al.  On-the-fly lattice rescoring for real-time automatic speech recognition , 2010, INTERSPEECH.