Deep Neural Model for Manipuri Multiword Named Entity Recognition with Unsupervised Cluster Feature

The recognition task of Multi-Word Named Entities (MNEs) in itself is a challenging task when the language is inflectional and agglutinative. Having break-through NLP researches with deep neural network and language modelling techniques, the applicability of such tech-niques/algorithms for Indian language like Manipuri remains unanswered. In this paper an attempt to recognize Manipuri MNE is performed using a Long Short Term Memory (LSTM) recurrent neural network model in conjunction with Part Of Speech (POS) embeddings. To further im-prove the classification accuracy, word cluster information using K-means clustering approach is added as a feature embedding. The cluster information is generated using a Skip-gram based words vector that contains the semantic and syntactic information of each word. The model so proposed does not use extensive language morphological features to elevate its accuracy. Finally the model’s performance is compared with the other machine learning based Manipuri MNE models.

[1]  Mamoru Komachi,et al.  Long Short-Term Memory for Japanese Word Segmentation , 2017, PACLIC.

[2]  K. P. Soman,et al.  Word Embedding Models for Finding Semantic Relationship between Words in Tamil Language , 2016 .

[3]  Rui Li,et al.  Multi-Granularity Chinese Word Embedding , 2016, EMNLP.

[4]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[5]  Sudeshna Sarkar,et al.  Using Word Embeddings for Query Translation for Hindi to English Cross Language Information Retrieval , 2016, Computación y Sistemas.

[6]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[7]  Hai Zhao,et al.  Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network , 2015, ArXiv.

[8]  Sivaji Bandyopadhyay,et al.  Manipuri Chunking: An Incremental Model with POS and RMWE , 2014, ICON.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Alex Graves Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[13]  Kishorjit Nongmeikapam,et al.  Improvement of CRF Based Manipuri POS Tagger by Using Reduplicated MWE (RMWE) , 2012, ICIT 2012.

[14]  Sivaji Bandyopadhyay,et al.  Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification , 2011, ArXiv.

[15]  Sivaji Bandyopadhyay,et al.  Identification of Reduplicated Multiword Expressions Using CRF , 2011, CICLing.

[16]  Sivaji Bandyopadhyay,et al.  Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM , 2010 .

[17]  Samir Borgohain,et al.  Morphological Analyzer for Manipuri: Design and Implementation , 2004, AACC.

[18]  Shamik Sural,et al.  Similarity between Euclidean and cosine angle distance for nearest neighbor queries , 2004, SAC '04.

[19]  Sivaji Bandyopadhyay,et al.  Identification of MWEs Using CRF in Manipuri and Improvement Using Reduplicated MWEs , 2010 .