A Rough Set Formalization of Quantitative Evaluation with Ambiguity

In this paper, we present the founding elements of a formal model of the evaluation paradigm in natural language processing. We propose an abstract model of objective quantitative evaluation based on rough sets, as well as the notion of potential performance space for describing the performance variations corresponding to the ambiguity present in hypothesis data produced by a computer program, when comparing it to the reference data created by humans. A formal model of the evaluation paradigm will be useful for comparing evaluations protocols, investigating evaluation constraint relaxation and getting a better understanding of the evaluation paradigm, provided it is general enough to be able to represent any natural language processing task.

[1]  Jean Véronis,et al.  Evaluation of multilingual text alignment systems: the ARCADE II project , 2006, LREC.

[2]  Patrick Paroubek,et al.  PASSAGE Syntactic Representation , 2008 .

[3]  Hocine Cherifi,et al.  Accuracy Measures for the Comparison of Classifiers , 2012, ICIT 2012.

[4]  Adam Kilgarriff,et al.  Introduction to the special issue on evaluating word sense disambiguation systems , 2002, Natural Language Engineering.

[5]  David T. Barnard Evaluation of Text and Speech Systems , 2013 .

[6]  Z. Pawlak,et al.  Rough sets perspective on data and knowledge , 2002 .

[7]  Patrick Paroubek,et al.  Data, Annotations and Measures in EASY the Evaluation Campaign for Parsers of French. , 2006, LREC.

[8]  Patrick Paroubek,et al.  PASSAGE Syntactic Representation: a Minimal Common Ground for Evaluation , 2010, LREC.

[9]  Johanna D. Moore,et al.  Report on the Second NLG Challenge on Generating Instructions in Virtual Environments (GIVE-2) , 2010, INLG.

[10]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[11]  Patrick Paroubek,et al.  The GRACE french part-of-speech tagging evaluation task , 1998, LREC.

[12]  Sven Naumann,et al.  Evaluation of Text and Speech Systems , 2009, J. Quant. Linguistics.

[13]  Antonio Toral,et al.  Evaluation of Natural Language Tools for Italian: EVALITA 2007 , 2008, LREC.

[14]  Ted Briscoe,et al.  Efficient Extraction of Grammatical Relations , 2005, IWPT.

[15]  Patrick Paroubek,et al.  PASSAGE: from French Parser Evaluation to Large Sized Treebank , 2008, LREC.

[16]  Joseph Le Roux,et al.  Feature Unification in TAG Derivation Trees , 2008, TAG.

[17]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[18]  ResnikPhilip,et al.  Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation , 1999 .

[19]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[20]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..

[21]  Martial Hebert,et al.  Toward Objective Evaluation of Image Segmentation Algorithms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Andrzej Skowron,et al.  Rough Sets: A Tutorial , 1998 .

[23]  L. Hirschman,et al.  Principles of Evaluation in Natural Language Processing , 2007 .

[24]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .