Learning Morpheme Representation for Mongolian Named Entity Recognition

Traditional approaches to Mongolian named entity recognition heavily rely on the feature engineering. Even worse, the complex morphological structure of Mongolian words made the data more sparsity. To alleviate the feature engineering and data sparsity in Mongolian named entity recognition, we propose a framework of recurrent neural networks with morpheme representation. We then study this framework in depth with different model variants. More specially, the morpheme representation utilizes the characteristic of classical Mongolian script, which can be learned from unsupervised corpus. Our model will be further augmented by different character representations and auxiliary language model losses which will extract context knowledge from scratch. By jointly decoding by Conditional Random Field layer, the model could learn the dependence between different labels. Experimental results show that feeding the morpheme representation into neural networks outperforms the word representation. The additional character representation and morpheme language model loss also improve the performance.

[1]  Yang Liu,et al.  Learning Distributed Representations of Uyghur Words and Morphemes , 2015, CCL.

[2]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[5]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[6]  Yassine Benajiba,et al.  Arabic Named Entity Recognition using Conditional Random Fields , 2008 .

[7]  Rui Li,et al.  Multi-Granularity Chinese Word Embedding , 2016, EMNLP.

[8]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[9]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[10]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[11]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[12]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[13]  Gökhan Akın Åžeker,et al.  Initial Explorations on using CRFs for Turkish Named Entity Recognition , 2012, Coling 2012.

[14]  Ebru Arisoy,et al.  Bidirectional recurrent neural network language models for automatic speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Sadao Kurohashi,et al.  Japanese Named Entity Recognition Using Structural Natural Language Processing , 2008, IJCNLP.

[16]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[17]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[18]  James Henderson,et al.  Named entity recognition with document-specific KB tag gazetteers , 2015, EMNLP.

[19]  Guanglai Gao,et al.  Cyrillic Mongolian Named Entity Recognition with Rich Features , 2016, NLPCC/ICCPOL.

[20]  Iryna Gurevych,et al.  Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.

[21]  Zhen Wang,et al.  Chinese Semantic Role Labeling with Bidirectional Recurrent Neural Networks , 2015, EMNLP.

[22]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[23]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[24]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[25]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[26]  Friedhelm Schwenker,et al.  Feature Selection for Recognition of Online Handwritten Bangla Characters , 2019, Neural Processing Letters.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[29]  Gerard de Melo,et al.  Morphological Segmentation with Window LSTM Neural Networks , 2016, AAAI.

[30]  Jasha Droppo,et al.  Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Frank Keller,et al.  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL , 2014, EMNLP.

[32]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[33]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[34]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[35]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[36]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[37]  Sampo Pyysalo,et al.  Attending to Characters in Neural Sequence Labeling Models , 2016, COLING.

[38]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Guanglai Gao,et al.  Mongolian Named Entity Recognition System with Rich Features , 2016, COLING.

[40]  Guanglai Gao,et al.  Mongolian Named Entity Recognition using suffixes segmentation , 2015, 2015 International Conference on Asian Language Processing (IALP).

[41]  Yassine Benajiba,et al.  Arabic Named Entity Recognition: Using Features Extracted from Noisy Data , 2010, ACL.

[42]  Zhiyuan Liu,et al.  Joint Learning of Character and Word Embeddings , 2015, IJCAI.

[43]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[44]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[45]  Helen Yannakoudakis,et al.  Neural Sequence-Labelling Models for Grammatical Error Correction , 2017, EMNLP.

[46]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[47]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[48]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[49]  Claire Cardie,et al.  Opinion Mining with Deep Recurrent Neural Networks , 2014, EMNLP.

[50]  Michal Konkol,et al.  CRF-Based Czech Named Entity Recognizer and Consolidation of Czech NER Research , 2013, TSD.

[51]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[52]  Atsunori Ogawa,et al.  ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  Nanyun Peng,et al.  Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning , 2016, ACL.