CUSAT_TEAM@IECSIL-FIRE-2018: A Named Entity Recognition System for Indian Languages

Named Entity Recognition is the process of classifying the elementary units in a text document into meaningful categories such as person, location, organization, etc. It is a significant preprocessing step in the semantic analysis of natural language text. There is an enormous growth of Indian language content on various media types such as websites, blogs, email, chats, etc. over the past decade. Automatic processing of this huge unstructured data is a challenging task especially when the companies are interested to ascertain public view on their products and processes. NER is one of the subtasks of Information Extraction. Extracting structured information from the natural language text is the ultimate goal of IE systems. Different methods are proposed and experimented for NER. In this paper, we propose a Named Entity Recognition system for Indian languages using Conditional Random Fields. Training and testing are conducted using the shared corpus provided by ’ARNEKT-IECSIL 2018’ competition organizers. The evaluation results show that the proposed system is able to outperform most of the reported methods in the competition.

[1]  Kavi Narayana Murthy,et al.  Named Entity Recognition for Telugu , 2008, IJCNLP.

[2]  Vishal Goyal,et al.  Name Entity Recognition Systems for Hindi Using CRF Approach , 2011, ICIS 2011.

[3]  Ameya Prabhu,et al.  Towards Deep Learning in Hindi NER: An approach to tackle the Labelled Data Sparsity , 2016, ICON.

[4]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[5]  Hinal Shah Study of Named Entity Recognition for Indian Languages , 2016 .

[7]  P SomanK.,et al.  Information Extraction for Conversational Systems in Indian Languages - Arnekt IECSIL , 2018, FIRE.

[8]  Sobha Lalitha Devi,et al.  Domain Focused Named Entity Recognizer for Tamil Using Conditional Random Fields , 2008, IJCNLP.

[9]  P SomanK.,et al.  Overview of Arnekt IECSIL at FIRE-2018 Track on Information Extraction for Conversational Systems in Indian Languages , 2018, FIRE.

[10]  B. V. Pawar,et al.  Survey of Named Entity Recognition Systems with respect to Indian and Foreign Languages , 2016 .

[11]  Xavier Carreras,et al.  A Simple Named Entity Extractor using AdaBoost , 2003, CoNLL.

[12]  M. Anand Kumar,et al.  Named Entity Recognition for Malayalam language: A CRF based approach , 2015, 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM).

[13]  P. M. Yohan,et al.  A Survey on Named Entity Recognition in Indian Languages with particular reference to Telugu , 2011 .