Is my Judge a good One?
暂无分享,去创建一个
[1] Michelle Vanni,et al. Inter-Rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm , 2005 .
[2] Philipp Koehn,et al. (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.
[3] Steven Abney,et al. How and Where do People Fail with Time: Temporal Reference Mapping Annotation by Chinese and English Bilinguals , 2006 .
[4] John B. Carroll. An experiment in evaluating the quality of translations , 1966, Mech. Transl. Comput. Linguistics.
[5] Chiori Hori,et al. Overview of the IWSLT 2005 Evaluation Campaign , 2005, IWSLT.
[6] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.
[7] John S. White,et al. The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches , 1994, AMTA.
[8] France,et al. Diagnosing Human Judgments in MT Evaluation : an Example based on the Spanish Language , 2008 .
[9] Khalid Choukri,et al. Assessing Human and Automated Quality Judgments in the French MT Evaluation Campaign CESTA , 2007 .
[10] Christian Boitet,et al. Towards fairer evaluations of commercial MT systems on basic travel expressions corpora , 2004, IWSLT.
[11] A. Feinstein,et al. High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.