Tagging with Disambiguation Rules - A New Evolutionary Approach to the Part-of-Speech Tagging Problem

In this paper we present an evolutionary approach to the part-of-speech tagging problem. The goal of part-ofspeech tagging is to assign to each word of a text its part-of-speech. The task is not straightforward, because a large percentage of words has more than one possible part-of-speech, and the right choice is determined by the surrounding word’s part-of-speeches. This means that to solve this problem we need a method to disambiguate a word’s possible tags set. Traditionally there are two groups of methods used to tackle this task. The first group is based on statistical data concerning the different context’s possibilities for a word, while the second group is based on rules, normally designed by human experts, that capture the language properties. In this work we present a solution that tries to incorporate both these approaches. The proposed system is divided in two components. First, we use an evolutionary algorithm that for each part-of-speech tag of the training corpus, evolves a set of disambiguation rules. We then use a second evolutionary algorithm, guided by the rules found earlier, to solve the tagging problem. The results obtained on two different corpora are amongst the best ones published for those corpora.

[1]  Stephen F. Smith,et al.  Competition-Based Induction of Decision Models from Examples , 2004, Machine Learning.

[2]  Kenneth A. De Jong,et al.  Using genetic algorithms for concept learning , 1993, Machine Learning.

[3]  Filippo Neri,et al.  Search-Intensive Concept Induction , 1995, Evolutionary Computation.

[4]  Malte Helmert,et al.  Transformation-Based Error-Driven Learning: Eine Fallstudie in Part of Speech Tagging , 2000 .

[5]  Lourdes Araujo Part-of-Speech Tagging with Evolutionary Algorithms , 2002, CICLing.

[6]  C. Janikow A Knowledge-Intensive Genetic Algorithm for Supervised Learning , 2004, Machine Learning.

[7]  Lourdes Araujo,et al.  How evolutionary algorithms are applied to statistical natural language processing , 2007, Artificial Intelligence Review.

[8]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[9]  Lourdes Araujo,et al.  Symbiosis of evolutionary techniques and statistical natural language processing , 2004, IEEE Transactions on Evolutionary Computation.

[10]  Enrique Alba,et al.  Metaheuristics for Natural Language Tagging , 2004, GECCO.

[11]  Alex A. Freitas,et al.  Discovering interesting prediction rules with a genetic algorithm , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[12]  Malcolm I. Heywood,et al.  Use of a genetic algorithm in brill's transformation-based part-of-speech tagger , 2005, GECCO '05.

[13]  Lourdes Araujo,et al.  Multiobjective Genetic Programming for Natural Language Parsing and Tagging , 2006, PPSN.

[14]  Enrique Alba,et al.  Natural language tagging with genetic algorithms , 2006, Inf. Process. Lett..

[15]  Donald Hindle,et al.  Acquiring Disambiguation Rules from Text , 1989, ACL.