Identification of MWEs Using CRF in Manipuri and Improvement Using Reduplicated MWEs

This paper deals with the identification of Multiword Expressions (MWE) in the Manipuri text using Conditional Random Field (CRF) machine learning techniques. Manipuri is a language spoken mainly in Manipur, a state in the North Eastern part of India as well as in some parts of Myanmar and Bangladesh. It is highly agglutinative in nature and thus the selection of features such as surrounding words, POS tag, prefix, suffix, length etc are identified for running the CRF tool for the identification of MWEs. The system shows a recall of 60.39%, precision of 85.53% and Fmeasure of 70.83%. Reduplicated MWEs are abundant in nature so identification of it makes an improvement in implementation of CRF. The new improved recall is 62.24%, precision is 86.06% and F-measure is 72.24%.