Low-cost evaluation techniques for information retrieval systems: A review
暂无分享,去创建一个
[1] Ben Carterette,et al. Reusable test collections through experimental design , 2010, SIGIR.
[2] Tetsuya Sakai,et al. Alternatives to Bpref , 2007, SIGIR.
[3] Mark Sanderson,et al. Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..
[4] Mark Baillie,et al. Evaluating epistemic uncertainty under incomplete assessments , 2008, Inf. Process. Manag..
[5] Ellen M. Voorhees,et al. Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.
[6] Charles L. A. Clarke,et al. Efficient construction of large test collections , 1998, SIGIR '98.
[7] Mark Sanderson,et al. Relatively relevant: Assessor shift in document judgements , 2010, ADCS 2010.
[8] James Allan,et al. Evaluation over thousands of queries , 2008, SIGIR '08.
[9] Charles L. A. Clarke,et al. Overview of the TREC 2004 Terabyte Track , 2004, TREC.
[10] Ben Carterette,et al. The effect of assessor error on IR system evaluation , 2010, SIGIR.
[11] Emine Yilmaz,et al. Inferring document relevance from incomplete information , 2007, CIKM '07.
[12] Ellen M. Voorhees,et al. Retrieval evaluation with incomplete information , 2004, SIGIR '04.
[13] David Hawking,et al. Overview of the TREC-9 Web Track , 2000, TREC.
[14] Thomas Mandl,et al. Recent Developments in the Evaluation of Information Retrieval Systems: Moving Towards Diversity and Practical Relevance , 2008, Informatica.
[15] Laurence Anthony F. Park,et al. Score adjustment for correction of pooling bias , 2009, SIGIR.
[16] Eero Sormunen,et al. Liberal relevance criteria of TREC -: counting on negligible documents? , 2002, SIGIR '02.
[17] Mark Sanderson,et al. Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.
[18] Peng Li,et al. Using Clustering to Improve Retrieval Evaluation without Relevance Judgments , 2010, COLING.
[19] Justin Zobel,et al. How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.
[20] Gabriella Kazai,et al. Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.
[21] James Allan,et al. Incremental test collections , 2005, CIKM '05.
[22] Ian Soboroff,et al. Ranking retrieval systems without relevance judgments , 2001, SIGIR '01.
[23] and software — performance evaluation , .
[24] David Hawking,et al. Overview of the TREC-2002 Web Track , 2002, TREC.
[25] Mark Sanderson,et al. Forming test collections with no system pooling , 2004, SIGIR '04.
[26] David Hawking,et al. Overview of TREC-7 Very Large Collection Track , 1997, TREC.
[27] Alistair Moffat,et al. System scoring using partial prior information , 2009, SIGIR.
[28] Justin Zobel,et al. Redundant documents and search effectiveness , 2005, CIKM '05.
[29] Ellen M. Voorhees,et al. Bias and the limits of pooling for large collections , 2007, Information Retrieval.
[30] Anselm Spoerri,et al. Using the structure of overlap between search results to rank retrieval systems without relevance judgments , 2007, Inf. Process. Manag..
[31] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.
[32] Ingemar J. Cox,et al. Selecting a Subset of Queries for Acquisition of Further Relevance Judgements , 2011, ICTIR.
[33] Sri Devi Ravana. Experimental evaluation of information retrieval systems , 2011 .
[34] Ingemar J. Cox,et al. Prioritizing relevance judgments to improve the construction of IR test collections , 2011, CIKM '11.
[35] Brian A Vander Schee. Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business , 2009 .
[36] Ben Carterette,et al. Low cost evaluation in information retrieval , 2010, SIGIR '10.
[37] Cyril Cleverdon,et al. The Cranfield tests on index language devices , 1997 .
[38] C. J. van Rijsbergen,et al. Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .
[39] Emine Yilmaz,et al. A statistical method for system evaluation using incomplete judgments , 2006, SIGIR.
[40] Charles L. A. Clarke,et al. Reliable information retrieval evaluation with incomplete and biased judgements , 2007, SIGIR.
[41] Alistair Moffat,et al. Score standardization for inter-collection comparison of retrieval systems , 2008, SIGIR '08.
[42] Per Ahlgren,et al. Evaluation of retrieval effectiveness with incomplete relevance data: Theoretical and experimental comparison of three measures , 2008, Inf. Process. Manag..
[43] Anselm Spoerri,et al. How the overlap between the search results of different retrieval systems correlates with document relevance , 2006, ASIST.
[44] Andrew Trotman,et al. Sound and complete relevance assessment for XML retrieval , 2008, TOIS.
[45] Ingemar J. Cox,et al. Optimizing the cost of information retrieval testcollections , 2011, PIKM '11.
[46] Alistair Moffat,et al. Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.
[47] Alistair Moffat,et al. Strategic system comparisons via targeted relevance judgments , 2007, SIGIR.
[48] Stephen P. Harter,et al. Variations in Relevance Assessments and the Measurement of Retrieval Effectiveness , 1996, J. Am. Soc. Inf. Sci..
[49] Jaap Kamps,et al. Evaluation effort, reliability and reusability in XML retrieval , 2011, J. Assoc. Inf. Sci. Technol..
[50] R. Fidel. Qualitative methods in information retrieval research. , 1993 .
[51] Ted S. Sindlinger,et al. Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business , 2010 .
[52] Per Ahlgren,et al. Retrieval evaluation with incomplete relevance data: a comparative study of three measures , 2006, CIKM '06.
[53] William Webber,et al. Measurement in information retrieval evaluation , 2010 .
[54] Alistair Moffat,et al. Score Aggregation Techniques in Retrieval Experimentation , 2009, ADC.
[55] Jong-Hak Lee,et al. Analyses of multiple evidence combination , 1997, SIGIR '97.
[56] Ben Carterette,et al. Measuring the reusability of test collections , 2010, WSDM '10.
[57] Yue Liu,et al. ICTNET at Web Track 2010 Diversity Task , 2010, TREC.
[58] James Allan,et al. Research methodology in studies of assessor effort for information retrieval evaluation , 2007 .
[59] Miles Efron,et al. Query polyrepresentation for ranking retrieval systems without relevance judgments , 2010, J. Assoc. Inf. Sci. Technol..
[60] Ellen M. Voorhees,et al. Overview of the TREC 2004 Robust Retrieval Track , 2004 .
[61] James Allan,et al. Minimal test collections for retrieval evaluation , 2006, SIGIR.
[62] Mark Sanderson,et al. Quantifying test collection quality based on the consistency of relevance judgements , 2011, SIGIR.