Tagging Urdu Text with Parts of Speech: A Tagger Comparison

In this paper, four state-of-art probabilistic taggers i.e. TnT tagger, TreeTagger, RF tagger and SVM tool, are applied to the Urdu language. For the purpose of the experiment, a syntactic tagset is proposed. A training corpus of 100,000 tokens is used to train the models. Using the lexicon extracted from the training corpus, SVM tool shows the best accuracy of 94.15%. After providing a separate lexicon of 70,568 types, SVM tool again shows the best accuracy of 95.66%.

[1]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[2]  John T. Platts,et al.  A Grammar of the Hindustani or Urdu Language , 1874 .

[3]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[4]  Andrew Hardie,et al.  Developing a tagset for automated part-of-speech tagging in Urdu. , 2003 .

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Helmut Schmid,et al.  Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging , 2008, COLING.

[7]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[8]  Miriam Butt The Structure of Complex Predicates in Urdu , 1995 .

[9]  FrenchJean-Pierre Chanod Statistical and constraint-based taggers for , 1994 .

[10]  Donald Hindle,et al.  Acquiring Disambiguation Rules from Text , 1989, ACL.

[11]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[12]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[13]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[14]  Andrew Hardie,et al.  The computational analysis of morphosyntactic categories in Urdu , 2004 .

[15]  Robert F. Simmons,et al.  A Computational Approach to Grammatical Coding of English Words , 1963, JACM.

[16]  Helmut Schmid,et al.  Part-of-Speech Tagging With Neural Networks , 1994, COLING.