Pooling-based continuous evaluation of information retrieval systems
暂无分享,去创建一个
Gianluca Demartini | Philippe Cudré-Mauroux | Alberto Tonon | P. Cudré-Mauroux | Gianluca Demartini | Alberto Tonon
[1] Wessel Kraaij,et al. Reliability and Validity of Query Intent Assessments , 2013, DIR.
[2] Emine Yilmaz,et al. Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.
[3] Charles L. A. Clarke,et al. Reliable information retrieval evaluation with incomplete and biased judgements , 2007, SIGIR.
[4] Andrew Trotman,et al. Focused Retrieval and Evaluation , 2011 .
[5] Gabriella Kazai,et al. In Search of Quality in Crowdsourcing for Search Engine Evaluation , 2011, ECIR.
[6] Ben Carterette,et al. Reusable test collections through experimental design , 2010, SIGIR.
[7] Gianluca Demartini,et al. Overview of the INEX 2009 Entity Ranking Track , 2009, INEX.
[8] Ellen M. Voorhees,et al. Overview of the Seventh Text REtrieval Conference , 1998 .
[9] James Allan,et al. Minimal test collections for retrieval evaluation , 2006, SIGIR.
[10] Ingemar J. Cox,et al. On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents , 2012, ECIR.
[11] R. McCreadie. Crowdsourcing Blog Track Top News Judgments at TREC , 2011 .
[12] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.
[13] J. Aslam,et al. A Practical Sampling Strategy for Efficient Retrieval Evaluation , 2007 .
[14] Emine Yilmaz,et al. A statistical method for system evaluation using incomplete judgments , 2006, SIGIR.
[15] Ellen M. Voorhees,et al. Retrieval evaluation with incomplete information , 2004, SIGIR '04.
[16] Javed A. Aslam,et al. IR system evaluation using nugget-based test collections , 2012, WSDM '12.
[17] Ricardo Baeza-Yates,et al. Design and Implementation of Relevance Assessments Using Crowdsourcing , 2011, ECIR.
[18] Omar Alonso,et al. Using crowdsourcing for TREC relevance assessment , 2012, Inf. Process. Manag..
[19] Krisztian Balog,et al. Head First: Living Labs for Ad-hoc Search Evaluation , 2014, CIKM.
[20] Emine Yilmaz,et al. A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.
[21] Roi Blanco,et al. Repeatable and reliable semantic search evaluation , 2013, J. Web Semant..
[22] Mark Sanderson,et al. Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.
[23] Stefano Mizzaro,et al. Relevance: The Whole History , 1997, J. Am. Soc. Inf. Sci..
[24] Roi Blanco,et al. Influence of Timeline and Named-Entity Components on User Engagement , 2013, ECIR.
[25] Donna K. Harman,et al. Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.
[26] Roi Blanco,et al. Repeatable and reliable search system evaluation using crowdsourcing , 2011, SIGIR.
[27] C. J. van Rijsbergen,et al. Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .
[28] Haim Levkowitz,et al. Introduction to information retrieval (IR) , 2008 .
[29] Gabriella Kazai,et al. Worker types and personality traits in crowdsourcing relevance labels , 2011, CIKM '11.
[30] Laurence Anthony F. Park,et al. Score adjustment for correction of pooling bias , 2009, SIGIR.
[31] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .
[32] Justin Zobel,et al. How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.
[33] Gabriella Kazai,et al. Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.
[34] Falk Scholer,et al. The effect of threshold priming and need for cognition on relevance calibration and assessment , 2013, SIGIR.
[35] Alistair A. Young,et al. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2017, MICCAI 2017.
[36] Ellen M. Voorhees,et al. Overview of the seventh text retrieval conference (trec-7) [on-line] , 1999 .
[37] Cyril W. Cleverdon,et al. Aslib Cranfield research project: report on the testing and analysis of an investigation into the comparative efficiency of indexing systems , 1962 .
[38] Gabriella Kazai,et al. Overview of the TREC 2012 Crowdsourcing Track , 2012, TREC.
[39] Andrew Trotman,et al. Overview of the INEX 2009 Entity Ranking Track , 2009 .
[40] Roi Blanco,et al. Evaluating ad-hoc object retrieval , 2010, IWEST@ISWC.
[41] Alistair Moffat,et al. Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.
[42] Alistair Moffat,et al. Strategic system comparisons via targeted relevance judgments , 2007, SIGIR.
[43] C. Cleverdon. Report on the testing and analysis of an investigation into comparative efficiency of indexing systems , 1962 .
[44] Gianluca Demartini,et al. Pick-a-crowd: tell me what you like, and i'll tell you what to do , 2013, CIDR.
[45] Michael E. Lesk,et al. Relevance assessments and retrieval system evaluation , 1968, Inf. Storage Retr..
[46] Ellen M. Voorhees,et al. Bias and the limits of pooling for large collections , 2007, Information Retrieval.
[47] Ellen M. Voorhees,et al. Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.