Shallow Parsing with PoS Taggers and Linguistic Features

Three data-driven publicly available part-of-speech taggers are applied to shallow parsing of Swedish texts. The phrase structure is represented by nine types of phrases in a hierarchical structure containing labels for every constituent type the token belongs to in the parse tree. The encoding is based on the concatenation of the phrase tags on the path from lowest to higher nodes. Various linguistic features are used in learning; the taggers are trained on the basis of lexical information only, part-of-speech only, and a combination of both, to predict the phrase structure of the tokens with or without part-of-speech. Special attention is directed to the taggers' sensitivity to different types of linguistic information included in learning, as well as the taggers' sensitivity to the size and the various types of training data sets. The method can be easily transferred to other languages.

[1]  Claire Cardie,et al.  Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification , 1998, ACL.

[2]  Walter Daelemans,et al.  Recent advances in memory-based part-of-speech tagging , 1999 .

[3]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[4]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[5]  Wojciech Skut,et al.  Chunk Tagger - Statistical Recognition of Noun Phrases , 1998, ArXiv.

[6]  Beáta Megyesi Comparing Data-Driven Learning Algorithms for PoS Tagging of Swedish , 2001, EMNLP.

[7]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[8]  Beata Megyesi Phrasal Parsing by Using Data-Driven PoS Taggers , 2001 .

[9]  Gunnar Eriksson,et al.  The Linguistic Annotation System of the Stockholm - Umea , 1993, EACL.

[10]  Hans van Halteren,et al.  Syntactic Wordclass Tagging , 1999 .

[11]  Walter Daelemans,et al.  Cascaded Grammatical Relation Assignment , 1999, EMNLP.

[12]  John Aycock,et al.  Compiling Little Languages in Python , 1998 .

[13]  Walter Daelemans,et al.  The Role of Algorithm Bias vs Information Source in Learning Algorithms for Morphosyntactic Disambiguation , 2000, CoNLL/LLL.

[14]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[15]  Walter Daelemans,et al.  Forgetting Exceptions is Harmful in Language Learning , 1998, Machine Learning.

[16]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[17]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[18]  James Cussens,et al.  Notes on Inductive Logic Programming Methods in Natural Language Processing (european Work) , 1998 .

[19]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[20]  Martin Eineborg,et al.  ILP in Part-of-Speech Tagging - An Overview , 2001, Learning Language in Logic.

[21]  N. Fakotakis,et al.  Memory-Based Text Chunking , 1999 .

[22]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[23]  Eric Brill,et al.  Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach , 1993, ACL.

[24]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[25]  Grace Ngai,et al.  Transformation Based Learning in the Fast Lane , 2001, NAACL.

[26]  Shlomo Argamon,et al.  A Memory-Based Approach to Learning Shallow Natural Language Patterns , 1998, ACL.

[27]  Thorsten Brants,et al.  Cascaded Markov Models , 1999, EACL.

[28]  Miles Osborne,et al.  Shallow Parsing as Part-of-Speech Tagging , 2000, CoNLL/LLL.

[29]  James Paul Gee,et al.  Performance structures: A psycholinguistic and linguistic appraisal , 1983, Cognitive Psychology.

[30]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.