A Machine Learning Approach to POS Tagging

We have applied the inductive learning of statistical decision trees and relaxation labeling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities, consisting of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired decision trees have been directly used in a tagger that is both relatively simple and fast, and which has been tested and evaluated on the Wall Street Journal (WSJ) corpus with competitive accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labeling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine-learned decision trees. Simultaneously, we address the problem of tagging when only limited training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that high levels of accuracy can be achieved with our system in this situation, and report some results obtained when using it to develop a 5.5 million words Spanish corpus from scratch.

[1]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[2]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[3]  Richard M. Schwartz,et al.  Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[4]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[5]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[6]  Lluís Padró,et al.  A Hybrid Environment for Syntax-Semantic Tagging , 1998, ArXiv.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Horacio Rodríguez,et al.  Part-of-Speech Tagging Using Decision Trees , 1998, ECML.

[9]  C. Chapelle The Computational Analysis of English—A Corpus‐Based Approach , 1988 .

[10]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  Emile H. L. Aarts,et al.  Boltzmann Machines and their Applications , 1987, PARLE.

[12]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[13]  Martin C. Herbordt,et al.  A System for Evaluating Performance and Cost of SIMD Array Designs , 2000, J. Parallel Distributed Comput..

[14]  Mark Stevenson,et al.  Combining independent knowledge sources for word sense disambiguation , 2000 .

[15]  Marcello Pelillo,et al.  Using simulated annealing to train relaxation labeling processes , 1994 .

[16]  David L. Waltz,et al.  Understanding Line drawings of Scenes with Shadows , 1975 .

[17]  Lluís Padró,et al.  On the Evaluation and Comparison of Taggers: the Effect of Noise in Testing Corpora , 1998, COLING-ACL.

[18]  Pere Garcia-Calvés,et al.  Comparing Information-Theoretic Attribute Selection Measures: A Statistical Approach , 1998, AI Commun..

[19]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[20]  P. Swain,et al.  On the accuracy of pixel relaxation labeling , 1981 .

[21]  Atro Voutilainen,et al.  Inducing constraint grammars , 1996, ICGI.

[22]  Tharam S. Dillon,et al.  A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Jean-Pierre Chanod,et al.  Tagging French - comparing a statistical and a constraint-based method , 1995, EACL.

[24]  Javier Larrosa,et al.  Optimization-based Heuristics for Maximal Constraint Satisfaction , 1995, CP.

[25]  Wendy G. Lehnert,et al.  Using Decision Trees for Coreference Resolution , 1995, IJCAI.

[26]  Marcello Pelillo,et al.  Learning Compatibility Coefficients for Relaxation Labeling Processes , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Horacio Rodríguez,et al.  Automatically acquiring a language model for POS tagging using decision trees , 2000 .

[28]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[29]  Eric Sven Ristad,et al.  Nonuniform Markov models , 1996, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[31]  Atro Voutilainen Three studies of grammar-based surface parsing of unrestricted English text , 1994 .

[32]  Eric Brill,et al.  Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[33]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[34]  Sergi Cervell,et al.  An environment for mophosyntactic processing of unrestricted Spanish text , 1998 .

[35]  Ted Briscoe Review of Corpus linguistics and the automatic analysis of English by Nelleke Oostdijk. Editions Rodopi 1991. , 1993 .

[36]  G. TEMPLE,et al.  Relaxation Methods in Engineering Science , 1942, Nature.

[37]  Javier Larrosa,et al.  Constraint Satisfaction as Global Optimization , 1995, IJCAI.

[38]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[39]  Lluís Màrquez i Villodre,et al.  Towards learning a constraint grammar from annotated corpora using decision trees , 1996 .

[40]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[41]  Lluís Padró POS Tagging Using Relaxation Labelling , 1996, COLING.

[42]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[43]  Carme Torras Relaxation and Neural Learning: Points of Convergence and Divergence , 1989, J. Parallel Distributed Comput..

[44]  Lluís Màrquez i Villodre,et al.  Part-of-speech Tagging: A Machine Learning Approach based on Decision Trees , 1999 .

[45]  Eric Brill,et al.  Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging , 1995, VLC@ACL.

[46]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[47]  Nelleke Oostdijk,et al.  Corpus Linguistics and the Automatic Analysis of English , 1991 .

[48]  Steven J. DeRose,et al.  Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[49]  Lluís Padró,et al.  A Flexible POS Tagger Using an Automatically Acquired Language Model , 1997, ACL.

[50]  Claire Cardie,et al.  Domain-specific knowledge acquisition for conceptual sentence analysis , 1995 .

[51]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[52]  Lluís Padró,et al.  Developing a hybrid NP parser , 1997, ANLP.

[53]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[54]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[55]  David M. Magerman,et al.  Learning grammatical stucture using statistical decision-trees , 1996, ICGI.

[56]  Helmut Schmid,et al.  Part-of-Speech Tagging With Neural Networks , 1994, COLING.

[57]  David Elworthy,et al.  Does Baum-Welch Re-estimation Help Taggers? , 1994, ANLP.

[58]  Atro Voutilainen,et al.  Comparing a Linguistic and a Stochastic Tagger , 1997, ACL.

[59]  S. A. Lloyd An optimization approach to relaxation labelling algorithms , 1983, Image Vis. Comput..

[60]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[61]  Adwait Ratnaparkhi,et al.  A Simple Introduction to Maximum Entropy Models for Natural Language Processing , 1997 .

[62]  Robert Krovetz,et al.  Homonymy and Polysemy in Information Retrieval , 1997, ACL.

[63]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[64]  Hans van Halteren,et al.  Improving Data Driven Wordclass Tagging by System Combination , 1998, ACL.

[65]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[66]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .