Named Entity Recognition in Tamil Language Using Recurrent Based Sequence Model

Information extraction is a key task in natural language processing which helps in knowledge discovery by extracting facts from the semi-structured text like natural language. Named entity recognition is one of the subtask under information extraction. In this work, we use recurrent based sequence models called Long Short-Time Memory (LSTM) for named entities recognition in Tamil language and word representation for words is done through a distributed representation of words. For this work, we have created a Tamil named entities recognition corpus by crawling Wikipedia and we have also used openly available FIRE-2018 Information Extractor for Conversational Systems in Indian Languages (IECSIL) shared task corpus.