Transformation Rule Learning without Rule Templates: A Case Study in Part of Speech Tagging

Part of speech (POS) tagging is an important problem and is one of the first steps included in many tasks in natural language processing. It affects directly on the accuracy of many other problems such as Syntax Parsing, WordSense Disambiguation, and Machine Translation. Stochastic models solve this problem relatively well, but they still make mistakes. Transformation-based learning (TBL) is a solution which can be used to improve stochastic taggers by learning a set of transformation rules. However, its rule learning algorithm has the disadvantages that rule templates must be prepared by hand and only rules are instances of rule templates can be generated. In this paper, we propose a model to learn transformation rules without rule templates. This model considers the rule learning problem as a feature selection problem. Experiments on PennTree Bank showed that the proposal model reduces errors of stochastic taggers with some tags.