Fast NP Chunking Using Memory-Based Learning Techniques

In this paper we discuss the application of Memory-Based Lea rning (MBL) to fast NP chunking. We first discuss the application of a fast decision tree varia nt of MBL (IGTree) on the dataset described in (Ramshaw and Marcus, 1995), which consists of roughly 50, 000 test and 200,000 train items. In a second series of experiments we used an architecture of two c ascaded IGTrees. In the second level of this cascaded classifier we added context predictions as extra fe atures so that incorrect predictions from the first level can be corrected, yielding a 97.2% generalisatio n accuracy with training and testing times in the order of seconds to minutes. The recall and precision for predicting NP chunks is respectively 94.3% and 89.0%

[1]  Royal Skousen,et al.  Real-Time Morphology: Symbolic Rules or Analogical Networks? , 1989 .

[2]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[3]  David W. Aha,et al.  Incremental Constructive Induction: An Instance-Based Approach , 1991, ML.

[4]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[5]  Vito Pirrelli,et al.  Analogy, computation and linguistic theory , 1997 .

[6]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[7]  Pat Langley,et al.  Oblivious Decision Trees and Abstract Cases , 1994 .

[8]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[9]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[10]  Walter Daelemans,et al.  IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[11]  Steve Chandler,et al.  Are rules and modules really necessary for explaining language? , 1993, Journal of Psycholinguistic Research.

[12]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[13]  Walter Daelemans,et al.  Rapid Development of NLP Modules with Memory-based Learning , 1998 .

[14]  Walter Daelemans,et al.  Generalization performance of backpropagation learning on a syllabification task , 1992 .

[15]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[16]  A.P.J. van den Bosch,et al.  Learning to pronounce written words : a study in inductive language learning , 1997 .

[17]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[18]  Royal Skousen,et al.  Analogical Modeling Of Language , 1989 .

[19]  Claire Cardie,et al.  Automating Feature Set Selection for Case-Based Learning of Linguistic Knowledge , 1996, EMNLP.

[20]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[21]  Walter Daelemans,et al.  Language-Independent Data-Oriented Grapheme-to-Phoneme Conversion , 1996 .