MediaEval 2013 Spoken Web Search Task: System Performance Measures

This document discusses how to measure system performance in the Spoken Web Search (SWS) task at MediaEval 2013. The discussion is based on different sources, including the NIST 2006 Spoken Term detection (STD) Evaluation Plan [1], the NIST 2010 Speaker Recognition Evaluation (SRE) Plan [2], the description of the scoring criteria applied in the SWS task at Mediaeval 2012 [3], the Albayzin 2012 Language Recognition Evaluation Plan [4] and the NIST 2013 Open Keyword Search (OpenKWS13) Evaluation Plan [5]. The SWS task at MediaEval 2013 is defined as searching for audio content within audio content using an audio content query [6]. The SWS task deals with two sets of multilingual speech contents: a set of query examples (involving one or more examples per query) and a set of audio documents on which searches are performed. Since both the queries and the audio documents may contain different languages, the search systems must be language-independent. Each query must be searched in an independent way, that is, without using information of other query searches. This also means that hard decisions must be taken separately for each query. A perfect system would detect the exact locations of all the query occurrences in the audio documents, and would yield no false detections. As in SWS 2012 [7], the system output will consist of a list of query detections, including an audio document identifier, a query identifier, a starting time, a duration, a score indicating how likely the detection is (with more positive values indicating more likely occurrences) and a hard (Yes/No) decision. A reference file with the exact locations of all the queries within the audio documents will be used to measure system performance.

[1]  Florian Metze,et al.  The Spoken Web Search Task at MediaEval 2011 , 2012, ICASSP.

[2]  N. Brummer,et al.  On calibration of language recognition scores , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[3]  David A. van Leeuwen,et al.  On calibration of language recognition scores , 2006, Odyssey.