LinGO Redwoods

Reflecting an increased need for stochastic parse selection models over hand-built linguistic grammars and a lack of appropriately detailed training material, we present the Linguistic Grammars On-Line (LinGo) Redwoods initiative, a seed activity in the design and development of a new type of treebank. LinGo Redwoods aims at the development of a novel treebanking methodology, (i) rich in nature and dynamic in both (ii) the ways linguistic data can be retrieved from the treebank in varying granularity and (iii) the constant evolution and regular updating of the treebank itself, synchronized to the development of ideas in syntactic theory. Starting in June 2001, the project has been working to build the foundations for this new type of treebank, develop a basic set of tools required for treebank construction and maintenance, and construct an initial set of 10,000 annotated trees to be distributed together with the tools under an open-source license.

[1]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[2]  David M. Carter,et al.  The TreeBanker: a Tool for Supervised Training of Parsed Corpora , 1997, ArXiv.

[3]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[4]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Eric Atwell Comparative evaluation of grammatical annotation models , 1996 .

[7]  Mark Johnson,et al.  Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[8]  Stefanie Dipper Grammar-Based Corpus Annotation , 2000, COLING 2000.

[9]  Rob Malouf,et al.  Efficient feature structure operations without compilation , 2000, Natural Language Engineering.

[10]  Ann Copestake,et al.  Implementing typed feature structure grammars , 2001, CSLI lecture notes series.

[11]  Bob Carpenter,et al.  Probabilistic Parsing using Left Corner Language Models , 1997, IWPT.

[12]  Christopher D. Manning,et al.  Feature Selection for a Rich HPSG Grammar Using Decision Trees , 2002, CoNLL.

[13]  Stephan Oepen,et al.  Parser engineering and performance profiling , 2000, Natural Language Engineering.

[14]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[15]  Thorsten Brants,et al.  The LinGO Redwoods Treebank: Motivation and Preliminary Applications , 2002, COLING.

[16]  David Carter The TreeBanker , 2000 .

[17]  Gertjan van Noord,et al.  The Alpino Dependency Treebank , 2001, CLIN.

[18]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[19]  Jonas Kuhn,et al.  Ambiguity Management in Grammar Writing , 2004 .

[20]  Stefan Müller,et al.  HPSG Analysis of German , 2000 .

[21]  Gertjan van Noord,et al.  Statistical Parsing of Dutch using Maximum Entropy Models with Feature Merging , 2001, NLPRS.

[22]  Krassimira Ivanova,et al.  Building a Linguistically Interpreted Corpus of Bulgarian: the BulTreeBank , 2002, LREC.

[23]  T. E. Harris,et al.  The Theory of Branching Processes. , 1963 .

[24]  Ulrich Callmeier,et al.  PET – a platform for experimentation with efficient HPSG processing techniques , 2000, Natural Language Engineering.

[25]  Hans-Ulrich Krieger,et al.  A Bag of Useful Techniques for Efficient and Robust Parsing , 1999, ACL.

[26]  Alex Lascarides,et al.  An Algebra for Semantic Construction in Constraint-based Grammars , 2001, ACL.

[27]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[28]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[29]  Dan Flickinger,et al.  On building a more effcient grammar by exploiting types , 2000, Natural Language Engineering.

[30]  Stephan Oepen,et al.  Measure for Measure: Parser Cross-fertilization - Towards Increased Component Comparability and Exchange , 2000, IWPT.

[31]  Ann A. Copestake,et al.  The ACQUILEX LKB: representation issues in semi-automatic acquisition of large lexicons , 1992, ANLP.

[32]  Gertjan van Noord,et al.  Alpino: Wide-coverage Computational Analysis of Dutch , 2000, CLIN.