Named Entity Recognition using Conditional Random Fields

Abstract Identifying named entities (NEs) present in electronic newspapers in regional languages is an important step in machine translation and summarization systems. In this paper, we propose a statistical named entity recognition system based on machine learning for the identification and classification of named entities present in Marathi language text. In our system, named entities are identified and classified using conditional random fields (CRFs). As being a morphologically rich language, statistical algorithms achieves good NE identification and classification accuracy but needs extra knowledge to improve accuracy. Experiments conducted on the FIRE-2010 corpus show that our system submitted for the challenge achieves the precision, recall and F1-measure of 82.33%, 70.68% and 75.51% under the CRF algorithm.