Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications
暂无分享,去创建一个
[1] Pavel Rychlý,et al. Low Inter-Annotator Agreement = An Ill-Defined Problem? , 2014, RASLAN.
[2] Ales Horák,et al. On Evaluation of Natural Language Processing Tasks - Is Gold Standard Evaluation Methodology a Good Solution? , 2016, ICAART.
[3] Geoffrey Sampson,et al. A proposal for improving the measurement of parse accuracy , 2000 .
[4] Adam Kilgarriff,et al. A Quantitative Evaluation of Word Sketches , 2010 .
[5] Adam Kilgarriff,et al. Extrinsic Corpus Evaluation with a Collocation Dictionary Task , 2014, LREC.
[6] Geoffrey Sampson,et al. A test of the leaf-ancestor metric for parse accuracy , 2003, Natural Language Engineering.
[7] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[8] Ralph Grishman,et al. Evaluating Parsing Strategies Using Standardized Parse Files , 1992, ANLP.
[9] Adam Kilgarriff,et al. The Sketch Engine: ten years on , 2014 .