A Simple Measure to Assess Non-response
暂无分享,去创建一个
[1] Maarten de Rijke,et al. Overview of the CLEF 2004 Multilingual Question Answering Track , 2004, CLEF.
[2] Tetsuya Sakai,et al. Evaluating evaluation metrics based on the bootstrap , 2006, SIGIR.
[3] Tetsuya Sakai,et al. On the reliability of information retrieval metrics based on graded relevance , 2007, Inf. Process. Manag..
[4] Ellen M. Voorhees,et al. The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.
[5] M. Felisa Verdejo,et al. Testing the Reasoning for Question Answering Validation , 2008, J. Log. Comput..
[6] Anselmo Peñas,et al. Overview of ResPubliQA 2009: Question Answering Evaluation over European Legislation , 2009, CLEF.
[7] M. Felisa Verdejo,et al. Question Answering Pilot Task at CLEF 2004 , 2004, CLEF.
[8] Tetsuya Sakai,et al. On the reliability of factoid question answering evaluation , 2007, TALIP.
[9] Ellen M. Voorhees,et al. The Twelfth Text Retrieval Conference, TREC 2003 , 2004 .
[10] Ellen M. Voorhees,et al. Evaluating evaluation measure stability , 2000, SIGIR '00.
[11] Ellen M. Voorhees,et al. Overview of the TREC 2004 Novelty Track. , 2005 .
[12] M. Felisa Verdejo,et al. Evaluating Answer Validation in Multi-stream Question Answering , 2008, EVIA@NTCIR.
[13] M. Felisa Verdejo,et al. Overview of the Answer Validation Exercise 2007 , 2006, CLEF.
[14] M. Felisa Verdejo,et al. Evaluating question answering validation as a classification problem , 2012, Lang. Resour. Evaluation.
[15] Ellen M. Voorhees,et al. The TREC-8 Question Answering Track Evaluation , 2000, TREC.
[16] M. Felisa Verdejo,et al. Overview of the Answer Validation Exercise 2006 , 2006, CLEF.
[17] Jennifer Chu-Carroll,et al. Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..