BANK OF ENGLISH AND BEYOND Hand-crafted parsers for functional annotation

The 200 million word corpus of the Bank of English was annotated morpholog­ ically and syntactically using the Englis h Constraint Gramm ar ana lyser, a rule­ based shallow parser developed at the Resea rch Unit for Comput ational Lin­ guistics, University of Helsinki. We discuss the annotation system and methods used in the corpus work , as well as the theoretical assumptions of the Constraint Gramma r syntax. Based on our experience in large-scale corpu s work, we argue for a deeper and more explicit, dependency-based syntactic representation. We present a new practical parsing system, the Functi onal Dependency Grammar parser, developed from the Con straint Grammar system, and discuss its suitabil­ ity for treebank annotation.

[1]  Eckhard Bick Dependensstrukturer i Constraint Grammar syntaks for por-tugisisk , 1997 .

[2]  Itziar Aduriz,et al.  EUSLEM: A Lemmatiser/Tagger for Basque , 1996 .

[3]  Fred Karlsson,et al.  Constraint Grammar as a Framework for Parsing Running Text , 1990, COLING.

[4]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[5]  Geoffrey Leech,et al.  CLAWS4: The Tagging of the British National Corpus , 1994, COLING.

[6]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[7]  Arto Anttila How to recognise Subjects in English , 1995 .

[8]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984 .

[9]  Fred Karlsson,et al.  Designing a parser for unrestricted text , 1995 .

[10]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[11]  Atro Voutilainen Hand-Crafted Rules , 1999 .

[12]  John Sinclair,et al.  Collins COBUILD English Language Dictionary , 1987 .

[13]  Pasi Tapanainen Parsing in two frameworks: finite-state and functional dependency grammar , 1999 .

[14]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[15]  C. Chapelle The Computational Analysis of English—A Corpus‐Based Approach , 1988 .

[16]  Arvi Hurskainen Disambiguation of morphological analysis in Bantu languages , 1996, COLING.

[17]  Atro Voutilainen,et al.  Compiling and testing the lexicon , 1995 .

[18]  Annette McElligott,et al.  Industrial Parsing of Software Manuals , 1996 .

[19]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[20]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[21]  Timo Järvinen Annotating 200 Million Words: The Bank Of English Project , 1994, COLING.