Overview of the NTCIR-15 We Want Web with CENTRE (WWW-3) Task

This is an overview of the NTCIR-15 We Want Web with CENTRE (WWW-3) task. The task features the Chinese subtask (adhoc web search) and the English subtask (adhoc web search, replicability and reproducibility), and received 48 runs from 9 teams. We describe the subtasks, data, evaluation measures, and the official evaluation results.

[1]  Tetsuya Sakai,et al.  Randomised vs. Prioritised Pools for Relevance Assessments: Sample Size Considerations , 2019, AIRS.

[2]  Tetsuya Sakai,et al.  Which Diversity Evaluation Measures Are "Good"? , 2019, SIGIR.

[3]  Tetsuya Sakai,et al.  How to Run an Evaluation Task - With a Primary Focus on Ad Hoc Information Retrieval , 2019, Information Retrieval Evaluation in a Changing World.

[4]  Tetsuya Sakai,et al.  Overview of CENTRE@CLEF 2018: A First Tale in the Systematic Reproducibility Realm , 2018, CLEF.

[5]  Yiqun Liu,et al.  Sogou-QCL: A New Dataset with Click Relevance Label , 2018, SIGIR.

[6]  Enrique Amigó,et al.  An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric , 2018, SIGIR.

[7]  Makoto P. Kato,et al.  Overview of NTCIR-13 , 2017, NTCIR.

[8]  Cheng Luo,et al.  Overview of the NTCIR-13 We Want Web Task , 2017, NTCIR.

[9]  Tetsuya Sakai,et al.  Topic set size design , 2015, Information Retrieval Journal.

[10]  Tetsuya Sakai,et al.  Metrics, Statistics, Tests , 2013, PROMISE Winter School.

[11]  Alistair Moffat,et al.  A similarity measure for indefinite rankings , 2010, TOIS.

[12]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[13]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[14]  Noriko Kando,et al.  Overview of the NTCIR-7 ACLIA IR4QA Task , 2008, NTCIR.

[15]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .