论文信息 - Named Entity Recognition in Tamil Language Using Recurrent Based Sequence Model

Named Entity Recognition in Tamil Language Using Recurrent Based Sequence Model

Information extraction is a key task in natural language processing which helps in knowledge discovery by extracting facts from the semi-structured text like natural language. Named entity recognition is one of the subtask under information extraction. In this work, we use recurrent based sequence models called Long Short-Time Memory (LSTM) for named entities recognition in Tamil language and word representation for words is done through a distributed representation of words. For this work, we have created a Tamil named entities recognition corpus by crawling Wikipedia and we have also used openly available FIRE-2018 Information Extractor for Conversational Systems in Indian Languages (IECSIL) shared task corpus.

K. P. Soman | M. Anand Kumar | V. Hariharan

[1] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2] M. Anand Kumar,et al. Entity Extraction for Malayalam Social Media Text Using Structured Skip-gram Based Embedding Features from Unlabeled Data , 2016 .

[3] P SomanK.,et al. Distributional Semantic Representation for Text Classification and Information Retrieval , 2016, FIRE.

[4] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[6] K. P. Soman,et al. Randomized kernel approach for Named Entity Recognition in Tamil , 2015, SOCO 2015.

[7] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.

[8] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[9] K. P. Soman,et al. Entity Extraction of Hindi-English and Tamil-English Code-Mixed Social Media Text , 2016, FIRE Workshop.

[10] P SomanK.,et al. Information Extraction for Conversational Systems in Indian Languages - Arnekt IECSIL , 2018, FIRE.

[11] K. P. Soman,et al. From Vector Space Models to Vector Space Models of Semantics , 2016, FIRE Workshop.

[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.