论文信息 - Pursuing a Moving Target: Iterative Use of Benchmarking of a Task to Understand the Task

Pursuing a Moving Target: Iterative Use of Benchmarking of a Task to Understand the Task

Individual tasks carried out within benchmarking initiatives, or campaigns, enable direct comparison of alternative approaches to tackling shared research challenges and ideally promote new research ideas and foster communities of researchers interested in common or related scientific topics. When a task has a clear predefined use case, it might straightforwardly adopt a well established framework and methodology. For example, an ad hoc information retrieval task adopting the standard Cranfield paradigm. On the other hand, in cases of new and emerging tasks which pose more complex challenges in terms of use scenarios or dataset design, the development of a new task is far from a straightforward process. This letter summarises our reections on our experiences as task organisers of the Search and Hyperlinking task from its origins as a Brave New Task at the MediaEval benchmarking campaign (2011-2014) to its current instantiation as a task at the NIST TRECVid benchmark (since 2015). We highlight the challenges encountered in the development of the task over a number of annual iterations, the solutions found so far, and our process for maintaining a vision for the ongoing advancement of the task's ambition.

Maria Eskevich | Gareth J. F. Jones | Benoit Huet | Roeland Ordelman | Robin Aly

[1] Maria Eskevich,et al. Defining and Evaluating Video Hyperlinking for Navigating Multimedia Archives , 2015, WWW.

[2] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[3] Alan F. Smeaton,et al. The scholarly impact of TRECVid (2003-2009) , 2011, J. Assoc. Inf. Sci. Technol..

[4] Noel E. O'Connor,et al. An Investigation into Feature Effectiveness for Multimedia Hyperlinking , 2014, MMM.

[5] Martha Larson,et al. Creating a Data Collection for Evaluating Rich Speech Retrieval , 2012, LREC.

[6] Martha Larson,et al. Blip10000: a social video dataset containing SPUG content for tagging and retrieval , 2013, MMSys.

[7] Maria Eskevich,et al. The Search and Hyperlinking Task at MediaEval 2013 , 2013, MediaEval.

[8] Martha Larson,et al. Overview of MediaEval 2011 Rich Speech Retrieval Task and Genre Tagging Task , 2011, MediaEval.

[9] Maria Eskevich,et al. Linking inside a video collection: what and how to measure? , 2013, WWW.

[10] Martha Larson,et al. Overview of VideoCLEF 2009: New Perspectives on Speech-based Multimedia Content Enrichment , 2009, CLEF.

[11] Jimmy J. Lin,et al. Report on the Evaluation-as-a-Service (EaaS) Expert Workshop , 2015, SIGIR Forum.