Test theory for evaluating reliability of IR test collections
暂无分享,去创建一个
[1] David A. Hull. Stemming algorithms: a case study for detailed evaluation , 1996 .
[2] Tetsuya Sakai,et al. On the reliability of information retrieval metrics based on graded relevance , 2007, Inf. Process. Manag..
[3] Charles L. A. Clarke,et al. Efficient construction of large test collections , 1998, SIGIR '98.
[4] Ellen M. Voorhees,et al. The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.
[5] Donna K. Harman,et al. Overview of the TREC 2003 Novelty Track , 2003, TREC.
[6] Ian Soboroff,et al. Overview of the TREC 2004 Novelty Track , 2004, TREC.
[7] Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..
[8] L. Crocker,et al. Introduction to Classical and Modern Test Theory , 1986 .
[9] Ian Soboroff. Do TREC web collections look like the web? , 2002, SIGF.
[10] Richard J. Shavelson,et al. Generalizability Theory: A Primer , 1991 .
[11] Paul Over,et al. Blind Men and Elephants: Six Approaches to TREC data , 1999, Information Retrieval.
[12] Stephen E. Robertson,et al. The TREC 2002 Filtering Track Report , 2002, TREC.
[13] Randall W. Potter,et al. Confidence intervals on variance components , 1992 .
[14] W. Hersh,et al. Factors associated with successful answering of clinical questions using an information retrieval system. , 2002, Bulletin of the Medical Library Association.