Rating Computer-Generated Questions with Mechanical Turk
暂无分享,去创建一个
[1] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.
[2] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[3] Glenn Carroll,et al. Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .
[4] Marilyn A. Walker,et al. SPoT: A Trainable Sentence Planner , 2001, NAACL.
[5] Chris Callison-Burch,et al. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.
[6] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[7] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.
[8] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[9] Noah A. Smith,et al. Good Question! Statistical Ranking for Question Generation , 2010, NAACL.
[10] Kevin Knight,et al. Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.
[11] Noah A. Smith,et al. Question Generation via Overgenerating Transformations and Ranking , 2009 .