Named Entity Recognition for Manipuri Using Support Vector Machine

This paper reports about the development of a Manipuri NER system, a less computerized Indian language. Two different models, one using an active learning technique based on the context patterns generated from an unlabeled news corpus and the other based on the well known Support Vector Machine (SVM), have been developed. The active learning technique has been considered as the baseline system. The Manipuri news corpus has been manually annotated with the major NE tags, namely Person name , Location name , Organization name and Miscellaneous name to apply SVM. The SVM based system makes use of the different contextual information of the words along with the variety of orthographic word-level features which are helpful in predicting the NE classes. In addition, lexical context patterns generated using the active learning technique have been used as the features of SVM in order to improve performance. The system has been trained and tested with 28,629 and 4,763 wordforms, respectively. Experimental results show the effectiveness of the proposed approach with the overall average Recall , Precision and F-Score values of 93.91%, 95.32% and 94.59% respectively.