Modeling POS Tagging for the Urdu Language

This paper presents a Parts-of-Speech (POS) tagger for a low resourced “Urdu” language. POS tagging is a primary preprocessing step in many natural language processing tasks such as sentiment classification, syntactic parsing and named-entity recognition. The proposed taggers make use of the two state-of-the-art models widely used for sequential tagging: Conditional Random Field (CRF) and the Bidirectional long short-term memory CRF (BiLSTM CRF). This work is the first instance of applying BiLSTM CRF model for POS tagging in the Urdu language. Both models achieved the F1 score of 96% on the test data, thus outperforming existing Urdu POS tagger with a significant margin.