论文信息 - Ranking the NTCIR ACLIA IR4QA Systems without Relevance Assessments

Ranking the NTCIR ACLIA IR4QA Systems without Relevance Assessments

We consider the problem of ranking information retrieval systems without relevance assessments in the context of collaborative evaluation forums such as NTCIR and TREC. Our short-term goal is to provide the NTCIR participants with a “system ranking forecast” prior to conducting manual relevance assessments, thereby reducing researchers’ “idle time” and accelarating research. The long term goal is to semi-automate repeated evaluation of search engines. Our experiments using the NTCIR-7 ACLIA IR4QA test collections show that pseudo-systemrankings based on a simple method are highly correlated with the “true” rankings. Encouraged by this positive finding, we plan to release system ranking forecasts to participants of the next round of IR4QA at NTCIR-8.

[1] James Allan,et al. Evaluation over thousands of queries , 2008, SIGIR '08.

[2] Noriko Kando,et al. Overview of the NTCIR-7 ACLIA IR4QA Task , 2008, NTCIR.

[3] Ian Soboroff,et al. Ranking retrieval systems without relevance judgments , 2001, SIGIR '01.

[4] Ellen M. Voorhees,et al. The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[5] Stephen E. Robertson,et al. A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[6] Javed A. Aslam,et al. On the effectiveness of evaluating retrieval systems in the absence of relevance judgments , 2003, SIGIR.

[7] Noriko Kando. Overview of the Seventh NTCIR Workshop , 2008, NTCIR.

[8] Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[9] Noriko Kando,et al. Are Popular Documents More Likely To Be Relevant? A Dive into the ACLIA IR4QA Pools , 2008, EVIA@NTCIR.

[10] Noriko Kando,et al. NTCIR-7 ACLIA IR4QA Results based on Qrels Version 2 , 2008, NTCIR.