Query-by-example retrieval via fast sequential dynamic time warping algorithm

We introduce a novel approach to Query-by-Example (QbE) retrieval, utilizing fundamental principles of posteriorgram-based Spoken Term Detection (STD), in this paper. Proposed approach is a kind of modification of widely used seg-mental variant of dynamic programming algorithm. Our solution represents sequential variant of DTW algorithm, employing one step forward moving strategy. Each DTW search is carried out sequentially, block by block, where each block represents squared input distance matrix, with size equal to the length of retrieved query. We also examine a way how to speed up sequential DTW algorithm without considerable loss in retrieving performance, by implementing linear time-aligned accumulated distance. The increase of detection accuracy is ensured by weighted cumulative distance score parameter. Therefore, we called this approach Weighted Fast Sequential - DTW (WFS-DTW) algorithm. A novel PCA-based silence discriminator is used along with this algorithm. Evaluation of proposed algorithm is carried out on ParDat1 corpus, using Term Weighted Value (TWV).

[1]  Delphine Charlet,et al.  Using textual information from LVCSR transcripts for phonetic-based spoken term detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Jonathan G. Fiscus,et al.  Results of the 2006 Spoken Term Detection Evaluation , 2006 .

[3]  James R. Glass,et al.  Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Lukás Burget,et al.  Spoken Term Detection System Based on Combination of LVCSR and Phonetic Search , 2007, MLMI.

[5]  Xavier Anguera Miró,et al.  Speed improvements to Information Retrieval-based dynamic time warping using hierarchical K-Means clustering , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Frédéric Bimbot,et al.  Zero-Resource Audio-Only Spoken Term Detection Based on a Combination of Template Matching Techniques , 2011, INTERSPEECH.

[7]  James R. Glass,et al.  Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[8]  Milos Cernak,et al.  Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition , 2011, TSD.

[9]  Jozef Vavrek,et al.  Audio classification utilizing a rule-based approach and the support vector machine classifier , 2013, 2013 36th International Conference on Telecommunications and Signal Processing (TSP).