Part of Speech Tagging Using a Network of Linear Separators

We present an architecture and an on-line learning algorithm and apply it to the problem of part-of-speech tagging. The architecture presented, SNOW, is a network of linear separators in the feature space, utilizing the Winnow update algorithm.Multiplicative weight-update algorithms such as Winnow have been shown to have exceptionally good behavior when applied to very high dimensional problems, and especially when the target concepts depend on only a small subset of the features in the feature space. In this paper we describe an architecture that utilizes this mistake-driven algorithm for multi-class prediction-selecting the part of speech of a word. The experimental analysis presented here provides more evidence to that these algorithms are suitable for natural language problems.The algorithm used is an on-line algorithm: every example is used by the algorithm only once, and is then discarded. This has significance in terms of efficiency, as well as quick adaptation to new contexts.We present an extensive experimental study of our algorithm under various conditions; in particular, it is shown that the algorithm performs comparably to the best known algorithms for POS.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[3]  N. Littlestone Learning Abound: Quickly When Irrelevant Attributes A New Linear-threshold Algorithm , 1988 .

[4]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[5]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[6]  Helmut Schmid,et al.  Part-of-Speech Tagging With Neural Networks , 1994, COLING.

[7]  Claire Cardie,et al.  Embedded machine learning systems for natural language processing: a general framework , 1995, Learning for Natural Language Processing.

[8]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[9]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[10]  Hinrich Schütze Distributional Part-of-Speech Tagging , 1995, EACL.

[11]  Hinrich Schütze,et al.  Distributional Part-of-Speech Tagging , 1995, EACL.

[12]  Eric Brill,et al.  Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging , 1995, VLC@ACL.

[13]  L. Ramshaw,et al.  Explor-ing the nature of transformation-based learning , 1996 .

[14]  Judith L. Klavans,et al.  Book Reviews: The Balancing Act: Combining Symbolic and Statistical Approaches to Language , 1997, CL.

[15]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[16]  Dan Roth,et al.  Incorporating Knowledge in Natural Language Learning: A Case Study , 1998, WordNet@ACL/COLING.

[17]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[18]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.