Novel methods for query selection and query combination in query-by-example spoken term detection

Query-by-example (QbE) spoken term detection (STD) is necessary for low-resource scenarios where training material is hardly available and word-based speech recognition systems cannot be employed. We present two novel contributions to QbE STD: the first introduces several criteria to select the optimal example used as query throughout the search system. The second presents a novel feature level example combination to construct a more robust query used during the search. Experiments, tested on with-in language and cross-lingual QbE STD setups, show a significant improvement when the query is selected according to an optimal criterion over when the query is selected randomly for both setups and a significant improvement when several examples are combined to build the input query for the search system compared with the use of the single best example. They also show comparable performance to that of a state-of-the-art acoustic keyword spotting system.

[1]  Timothy J. Hazen,et al.  Query-by-example spoken term detection using phonetic posteriorgram templates , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[2]  Lukás Burget,et al.  Investigation into bottle-neck features for meeting speech recognition , 2009, INTERSPEECH.

[3]  Bhuvana Ramabhadran,et al.  Query-by-example Spoken Term Detection For OOV terms , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[4]  Sridha Sridharan,et al.  Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[6]  James R. Glass,et al.  Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[7]  Andreas Stolcke,et al.  The SRI/OGI 2006 spoken term detection system , 2007, INTERSPEECH.

[8]  Timothy J. Hazen,et al.  A comparison of query-by-example methods for spoken term detection , 2009, INTERSPEECH.

[9]  Bhuvana Ramabhadran,et al.  Effect of pronounciations on OOV queries in spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Lukás Burget,et al.  Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[11]  Bhuvana Ramabhadran,et al.  Effect of pronunciations on OOV queries in spoken term detection , 2009 .

[12]  W. Russell,et al.  Continuous hidden Markov modeling for speaker-independent word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[13]  David Miller,et al.  From switchboard to fisher: telephone collection protocols, their uses and yields , 2003, INTERSPEECH.

[14]  Kenney Ng,et al.  Subword-based approaches for spoken document retrieval , 2000, Speech Commun..