Ranking the NTCIR ACLIA IR4QA Systems without Relevance Assessments

We consider the problem of ranking information retrieval systems without relevance assessments in the context of collaborative evaluation forums such as NTCIR and TREC. Our short-term goal is to provide the NTCIR participants with a “system ranking forecast” prior to conducting manual relevance assessments, thereby reducing researchers’ “idle time” and accelarating research. The long term goal is to semi-automate repeated evaluation of search engines. Our experiments using the NTCIR-7 ACLIA IR4QA test collections show that pseudo-systemrankings based on a simple method are highly correlated with the “true” rankings. Encouraged by this positive finding, we plan to release system ranking forecasts to participants of the next round of IR4QA at NTCIR-8.