TUKE at MediaEval 2013 Spoken Web Search Task

This paper provides a rough description of zero resource Query-by-Example retrieving system for the MediaEval 2013 spoken web search task. The proposed solution firstly implements the voice activity detection (VAD) utilizing variance of acceleration MFCC (VAMFCC) rule-based approach. A PCA-based segmentation, K-means clustering and GMM training are then used in order to built the posteriorgrams. Finally, two searching architectures based on posteriorgram matching (SDTW) and GMM modeling (GMM-FST) are evaluated. Results show that none of our systems is able to achieve the positive Actual Term Weighted Value, because of high number of insertions. We suppose that chosen clustering scheme caused generation of too many false alarms. Only provided data were used and no other resources were examined in any system component during the development.

[1]  Florian Metze,et al.  The Spoken Web Search Task , 2012, MediaEval.

[2]  Jozef Vavrek,et al.  TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM , 2012, MediaEval.

[3]  James R. Glass,et al.  Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Jozef Juhar,et al.  Finite-state transducers and speech recognition in Slovak language , 2009, Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2009.