Through Low-Cost Annotation to Reliable Parsing Evaluation

In this paper, we present an~application-driven low-cost concept of building a~multi-purpose language resource for Czech which is based on currently available results of previous work by various research teams active in the area of natural language processing. We particularly focus on the first phase which consists in extracting noun phrases from a~morphologically annotated corpus and providing a~simple and easy-to-use application for verifying them. For the extraction task, three Czech parsers have been accommodated and evaluated. Finally we discuss the currently achieved results in the context of ongoing work and show that they lead to consistent and reliable results.

[1]  Geoffrey Sampson,et al.  A proposal for improving the measurement of parse accuracy , 2000 .

[2]  Aleš Horák,et al.  Syntactic Analysis as Pattern Matching: The SET Parsing System , 2009 .

[3]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[4]  Karel Pala,et al.  Building Czech Wordnet , 2004 .

[5]  Richard P. Gabriel Lisp : Good News Bad News How to Win Big , 1990 .

[6]  Jan Haji Complex Corpus Annotation: The Prague Dependency Treebank , 2005 .

[7]  Aleš Horák,et al.  Punctuation Detection with Full Syntactic Parsing , 2010 .

[8]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[9]  Aleš Horák,et al.  Building a Large Lexicon of Complex Valency Frames , 2007 .

[10]  Zdenek Zabokrtský,et al.  Combining Czech Dependency Parsers , 2006, TSD.

[11]  Vito Pirrelli,et al.  Advanced Tools for the Study of Natural Interactivity , 2002, LREC.

[12]  Karel Pala,et al.  DESAM - Annotated Corpus for Czech , 1997, SOFSEM.

[13]  Cynthia D. Fisherl Boredom at Work: A Neglected Concept , 1993 .

[14]  Ralph Grishman,et al.  Evaluating syntax performance of parser/grammars , 1991 .

[15]  Ales Horák,et al.  Mining Phrases from Syntactic Analysis , 2009, TSD.

[16]  Ales Horák,et al.  Dependency and Phrasal Parsers of the Czech Language: A Comparison , 2007, TSD.

[17]  Jun'ichi Tsujii,et al.  Evaluating contributions of natural language parsers to protein–protein interaction extraction , 2008, Bioinform..