The importance of annotated corpora for NLP: the cases of anaphora resolution and clause splitting

In this paper we present two applications that depend on annotated corpora for their implementation, evaluation and improvement. The first is an automatic anaphora resolution system. After describing the algorithm we discuss the importance of corpora for the tasks of evaluation and automatic scoring and the development of a coreferentially annotated corpus. We go on to look ahead at the role of corpora in optimisation and semi-automatic annotation. The second task investigates the use of an annotated corpus with a machine learning algorithm for clause splitting. We show that the method minimises the number of hand made rules necessary to achieve a good result.

[1]  Ruslan Mitkov Towards Automatic Annotation of Anaphoric Links in Corpora , 1999 .

[2]  Alon Itai,et al.  Automatic Processing of Large Corpora for the Resolution of Anaphora References , 1990, COLING.

[3]  Harris Papageorgiou Clause recognition in the framework of alignment , 1997 .

[4]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[5]  Scott Bennett,et al.  Applying machine learning to anaphora resolution , 1995, Learning for Natural Language Processing.

[6]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[7]  Ruslan Mitkov,et al.  Robust Pronoun Resolution with Limited Knowledge , 1998, ACL.

[8]  Dan Tufis,et al.  Tagging romanian texts: a case study for QTAG, a language independent probabilistic tagger , 1998 .

[9]  Geoffrey Sampson,et al.  English for the Computer: The SUSANNE Corpus and Analytic Scheme , 1995, Computational Linguistics.

[10]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.

[11]  Eva I. Ejerhed,et al.  Finding Clauses in Unrestricted Text by Finitary and Stochastic Methods , 1988, ANLP.

[12]  Breck Baldwin,et al.  CogNIAC: high precision coreference with limited knowledge and linguistic resources , 1997 .

[13]  Clause Processing in Complex Sentences Vilson , 1999 .

[14]  Vilson J. Leffa Clause processing in cornplex sentences , 1998 .

[15]  J. Veenstra,et al.  Fast NP Chunking using Memory-Based learning techniques , 1998 .

[16]  Jerry R. Hobbs Resolving pronoun references , 1986 .