Predicting relevance based on assessor disagreement: analysis and practical applications for search evaluation
暂无分享,去创建一个
Djoerd Hiemstra | Thomas Demeester | Dong Nguyen | Chris Develder | Robin Aly | Chris Develder | D. Hiemstra | R. Aly | T. Demeester | Dong Nguyen
[1] David Maxwell Chickering,et al. Here or there: preference judgments for relevance , 2008 .
[2] John D. Lafferty,et al. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.
[3] Eero Sormunen,et al. Liberal relevance criteria of TREC -: counting on negligible documents? , 2002, SIGIR '02.
[4] Djoerd Hiemstra,et al. Federated Search in the Wild , 2012, CIKM 2012.
[5] Tetsuya Sakai,et al. On the reliability of information retrieval metrics based on graded relevance , 2007, Inf. Process. Manag..
[6] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.
[7] Olivier Chapelle,et al. Expected reciprocal rank for graded relevance , 2009, CIKM.
[8] Peter Bailey,et al. Relevance assessment: are judges exchangeable and does it matter , 2008, SIGIR '08.
[9] Ben Carterette,et al. Alternative assessor disagreement and retrieval depth , 2012, CIKM '12.
[10] Gabriella Kazai,et al. User intent and assessor disagreement in web search evaluation , 2013, CIKM.
[11] Alistair Moffat,et al. Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.
[12] Djoerd Hiemstra,et al. Overview of the TREC 2014 Federated Web Search Track , 2013, TREC.
[13] Ben Carterette,et al. Incorporating variability in user behavior into systems based evaluation , 2012, CIKM.
[14] Ingemar J. Cox,et al. On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents , 2012, ECIR.
[15] Stephen P. Harter,et al. Variations in Relevance Assessments and the Measurement of Retrieval Effectiveness , 1996, J. Am. Soc. Inf. Sci..
[16] Pertti Vakkari,et al. The influence of relevance levels on the effectiveness of interactive information retrieval , 2004, J. Assoc. Inf. Sci. Technol..
[17] Stephen E. Robertson,et al. Extending average precision to graded relevance judgments , 2010, SIGIR.
[18] Yong Yu,et al. Learning the Gain Values and Discount Factors of DCG , 2012, ArXiv.
[19] Ellen M. Voorhees,et al. Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.
[20] Mark D. Smucker,et al. A qualitative exploration of secondary assessor relevance judging behavior , 2014, IIiX.
[21] Hongyuan Zha,et al. Learning the Gain Values and Discount Factors of Discounted Cumulative Gains , 2014, IEEE Transactions on Knowledge and Data Engineering.
[22] Ben Carterette,et al. The effect of assessor error on IR system evaluation , 2010, SIGIR.
[23] J. Shane Culpepper,et al. Including summaries in system evaluation , 2009, SIGIR.
[24] Tetsuya Sakai,et al. Overview of NTCIR-10 , 2013, NTCIR.
[25] Ellen M. Voorhees,et al. Evaluation by highly relevant documents , 2001, SIGIR '01.
[26] Evangelos Kanoulas,et al. Empirical justification of the gain and discount function for nDCG , 2009, CIKM.
[27] Milad Shokouhi,et al. Expected browsing utility for web search evaluation , 2010, CIKM.
[28] Djoerd Hiemstra,et al. FedWeb Greatest Hits: Presenting the New Test Collection for Federated Web Search , 2015, WWW.
[29] Djoerd Hiemstra,et al. Exploiting user disagreement for web search evaluation: an experimental approach , 2014, WSDM.
[30] Charles L. A. Clarke,et al. Modeling user variance in time-biased gain , 2012, HCIR '12.
[31] Djoerd Hiemstra,et al. Federated search in the wild: the combined power of over a hundred search engines , 2012, CIKM '12.
[32] Sreenivas Gollapudi,et al. Diversifying search results , 2009, WSDM '09.
[33] Yiqun Liu,et al. Overview of the NTCIR-9 INTENT Task , 2011, NTCIR.
[34] Jaana Kekäläinen,et al. Binary and graded relevance in IR evaluations--Comparison of the effects on ranking of IR systems , 2005, Inf. Process. Manag..
[35] Yiqun Liu,et al. Overview of the NTCIR-10 INTENT-2 Task , 2013, NTCIR.