Intrinsic versus Extrinsic Evaluations of Parsing Systems

A wide range of parser and/or grammar evaluation methods have been reported in the literature. However, in most cases these evaluations take the parsers independently (intrinsic evaluations), and only in a few cases has the effect of different parsers in real applications been measured (extrinsic evaluations). This paper compares two evaluations of the Link Grammar parser and the Conexor Functional Dependency Grammar parser. The parsing systems, despite both being dependency-based, return different types of dependencies, making a direct comparison impossible. In the intrinsic evaluation, the accuracy of the parsers is compared independently by converting the dependencies into grammatical relations and using the methodology of Carroll et al. (1998) for parser comparison. In the extrinsic evaluation, the parsers' impact in a practical application is compared within the context of answer extraction. The differences in the results are significant.

[1]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[2]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[3]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[4]  M VoorheesEllen The TREC question answering track , 2001 .

[5]  Margaret King,et al.  Evaluating natural language processing systems , 1996, CACM.

[6]  Jerry R. Hobbs Ontological Promiscuity , 1985, ACL.

[7]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[8]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[9]  Annette McElligott,et al.  Industrial Parsing of Software Manuals , 1996 .

[10]  Ezra Black Evaluation of broad-coverage natural-language parsers , 1997 .

[11]  David L. Davidson,et al.  The Logical Form of Action Sentences , 2001 .

[12]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[13]  Rolf Schwitter,et al.  ExtrAns, an answer extraction system , 2000 .

[14]  R. Hursthouse THE LOGIC OF DECISION AND ACTION , 1969 .

[15]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[16]  Ted Briscoe,et al.  Corpus Annotation for Parser Evaluation , 1999, ArXiv.

[17]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[18]  Beth Ann Hockey,et al.  Grammar & Parser Evaluation in the XTAG Project , 1998 .