Query-by-example spoken term detection evaluation on low-resource languages

As part of the MediaEval 2013 benchmark evaluation campaign, the objective of the Spoken Web Search (SWS) task was to perform Query-by-Example Spoken Term Detection (QbE-STD), using spoken queries to retrieve matching segments in a set of audio files. As in previous editions, the SWS 2013 evaluation focused on the development of technology specifically designed to perform speech search in a low-resource setting. In this paper, we first describe the main features of past SWS evaluations and then focus on the 2013 SWS task, in which a special effort was made to prepare a challenging database, including speech in 9 different languages with diverse environment and channel conditions. The main novelties of the submitted systems are reviewed and performance figures are then presented and discussed, demonstrating the feasibility of the proposed task, even under such challenging conditions. Finally, the fusion of the 10 top-performing systems is analyzed. The best fusion provides a 30% relative improvement over the best single system in the evaluation, which proves that a variety of approaches can be effectively combined to bring complementary information in the search for queries.

[1]  Florian Metze,et al.  Language independent search in MediaEval's Spoken Web Search task , 2014, Comput. Speech Lang..

[2]  Jordi Luque,et al.  The Telefonica Research Spoken Web Search System for MediaEval 2013 , 2013, MediaEval.

[3]  Xavier Anguera Miró Information retrieval-based dynamic time warping , 2013, INTERSPEECH.

[4]  Martin Lojka,et al.  TUKE at MediaEval 2013 Spoken Web Search Task , 2013, MediaEval.

[5]  Etienne Barnard,et al.  ASR corpus design for resource-scarce languages , 2009, INTERSPEECH.

[6]  Carmen García-Mateo,et al.  Multi-site heterogeneous system fusions for the Albayzin 2010 Language Recognition Evaluation , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[7]  Kishore Prahallad,et al.  IIIT-H SWS 2013: Gaussian Posteriorgrams of Bottle-Neck Features for Query-by-Example Spoken Term Detection , 2013, MediaEval.

[8]  João Paulo da Silva Neto,et al.  The COST278 Pan-European Broadcast News Database , 2004, LREC.

[9]  Tan Lee,et al.  The CUHK Spoken Web Search System for MediaEval 2013 , 2013, MediaEval.

[10]  Emilio Sanchis Arnal,et al.  ELiRF at MediaEval 2013: Spoken Web Search Task , 2013, MediaEval.

[11]  Michal Kuba,et al.  UNIZA System for the Spoken Web Search Task at MediaEval2013 , 2013, MediaEval.

[12]  Horia Cucu,et al.  SpeeD @ MediaEval 2013: A Phone Recognition Approach to Spoken Term Detection , 2013, MediaEval.

[13]  Mireia Díez,et al.  GTTS Systems for the SWS Task at MediaEval 2013 , 2013, MediaEval.

[14]  Mikel Penagarikano MediaEval 2013 Spoken Web Search Task: System Performance Measures , 2013 .

[15]  Florian Metze,et al.  The speech recognition virtual kitchen , 2013, INTERSPEECH.

[16]  Florian Metze,et al.  The Spoken Web Search Task at MediaEval 2011 , 2012, ICASSP.

[17]  Ramón Fernández Astudillo,et al.  The L2F Spoken Web Search system for Mediaeval 2012 , 2012, MediaEval.

[18]  Lukás Burget,et al.  BUT SWS 2013 - Massive Parallel Approach , 2013, MediaEval.

[19]  Timothy J. Hazen,et al.  Query-by-example spoken term detection using phonetic posteriorgram templates , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[20]  Florian Metze,et al.  Query by Example Search on Speech at Mediaeval 2015 , 2014, MediaEval.

[21]  Lukás Burget,et al.  Calibration and fusion of query-by-example systems — But SWS 2013 , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Lukás Burget,et al.  Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[23]  Jonathan G. Fiscus,et al.  Results of the 2006 Spoken Term Detection Evaluation , 2006 .

[24]  Mohamed Morchid,et al.  LIA @ MediaEval 2013 Spoken Web Search Task: An I-Vector based Approach , 2013, MediaEval.

[25]  Mark A. Clements,et al.  Spoken Web Search using an Ergodic Hidden Markov Model of Speech , 2013, MediaEval.

[26]  Luis Javier Rodríguez-Fuentes,et al.  On the calibration and fusion of heterogeneous spoken term detection systems , 2013, INTERSPEECH.

[27]  Florian Metze,et al.  The Spoken Web Search Task , 2012, MediaEval.

[28]  Xavier Binefa,et al.  The CMTECH Spoken Web Search System for MediaEval 2013 , 2013, MediaEval.