论文信息 - Evaluation of NLP Systems - 字舞流文

Evaluation of NLP Systems

• (YDOXDWLRQ LV LWVHOI D ILUVW FODVV UHVHDUFK DFWLYLW\ FUHDWLRQ RI HIIHFWLYH HYDOXDWLRQ PHWKRGV GULYHV PRUH UDSLG SURJUHVV DQG EHWWHU FRPPXQLFDWLRQ ZLWKLQ D UHVHDUFK FRPPXQLW\ (Hirschman, 1998:302f) • >%HIRUH@ WKHUH ZHUH QR FRPPRQ PHDVXUHV DQG QR VKDUHG GDWD $V D FRQVHTXHQFH V\VWHPV DQG DSSURDFKHV FRXOG QRW EH SUHFLVHO\ FRPSDUHG DQG UHVXOWV FRXOG QRW EH UHSOLFDWHG (Gaizauskas, 1998:249) Lack of evaluation history

Jimmy J. Lin | Philip Resnik | Jimmy Lin | P. Resnik

[1] Yuval Krymolowski. Using the Distribution of Performance for Studying Statistical NLP Systems and Corpora , 2001, ACL 2001.

[2] Marine Carpuat,et al. Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[3] Lluís Màrquez i Villodre,et al. A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation , 2000, CoNLL/LLL.

[4] Ted Pedersen,et al. Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[5] Eneko Agirre,et al. Word Sense Disambiguation: Algorithms and Applications , 2007 .

[6] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7] Donna K. Harman,et al. The TREC Test Collections , 2005 .

[8] Philip Resnik,et al. A Perspective on Word Sense Disambiguation Methods and Their Evaluation , 2002 .

[9] Jimmy J. Lin,et al. What Makes a Good Answer? The Role of Context in Question Answering , 2003, INTERACT.

[10] Jimmy J. Lin,et al. Selectively Using Relations to Improve Precision in Question Answering , 2003 .

[11] Philip Resnik,et al. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[12] Janyce Wiebe,et al. Word-Sense Disambiguation Using Decomposable Models , 1994, ACL.

[13] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[14] Yorick Wilks,et al. A Preferential, Pattern-Seeking, Semantics for Natural Language Inference , 1975, Artif. Intell..

[15] S S Stevens,et al. On the Theory of Scales of Measurement. , 1946, Science.

[16] Leonard R. Sussman,et al. Nominal, Ordinal, Interval, and Ratio Typologies are Misleading , 1993 .

[17] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[18] David Yarowsky,et al. Statistical Machine Translation: Final Report , 1999 .

[19] Adam Kilgarriff,et al. SENSEVAL: an exercise in evaluating world sense disambiguation programs , 1998, LREC.

[20] Nancy Ide,et al. Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[21] George A. Miller,et al. A Semantic Concordance , 1993, HLT.

[22] 吴道平. Everything That Linguists Have Always Wanted to Know About Logic But Were Ashamed to Ask , 1985 .

[23] Daniel Gildea,et al. Corpus Variation and Parser Performance , 2001, EMNLP.

[24] Daniel Gildea,et al. Automatic Labeling of Semantic Roles , 2000, ACL.

[25] Alexander M. Fraser,et al. A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[26] Michael E. Lesk,et al. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[27] Hwee Tou Ng,et al. Getting Serious about Word Sense Disambiguation , 2002 .

[28] Alexander S. Yeh,et al. More accurate tests for the statistical significance of result differences , 2000, COLING.

[29] David Yarowsky,et al. Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs , 1992, ACL.

[30] Margaret King,et al. Evaluating natural language processing systems , 1996, CACM.

[31] B. Efron,et al. A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[32] Daniel Jurafsky,et al. Automatic Labeling of Semantic Roles , 2002, CL.

[33] ProgramsAdam Kilgarri Itri. SENSEVAL : An Exercise in Evaluating Word SenseDisambiguation , 1998 .

[34] Louise Guthrie,et al. Lexical Disambiguation using Simulated Annealing , 1992, HLT.

[35] Hwee Tou Ng,et al. Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[36] Julie Weeds,et al. Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[37] Sanjeev Khudanpur,et al. Language model adaptation using cross-lingual information , 2003, INTERSPEECH.

[38] Karen Sparck Jones,et al. Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[39] Christiane Fellbaum,et al. Making fine-grained and coarse-grained sense distinctions, both manually and automatically , 2006, Natural Language Engineering.

[40] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[41] Ellen M. Voorhees,et al. Corpus-Based Statistical Sense Resolution , 1993, HLT.

[42] David Yarowsky,et al. Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[43] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[44] Richard M. Schwartz,et al. A Methodology for Extrinsic Evaluation of Text Summarization: Does ROUGE Correlate? , 2005, IEEvaluation@ACL.

[45] ResnikPhilip,et al. Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation , 1999 .

[46] Yehoshua Bar-Hillel,et al. The Present Status of Automatic Translation of Languages , 1960, Adv. Comput..

[47] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.