Recurrent Neural Network Based Loanwords Identification in Uyghur

Comparable corpus is the most important resource in several NLP tasks. However, it is very expensive to collect manually. Lexical borrowing happened in almost all languages. We can use the loanwords to detect useful bilingual knowledge and expand the size of donor-recipient / recipient-donor comparable corpora. In this paper, we propose a recurrent neural network (RNN) based framework to identify loanwords in Uyghur. Additionally, we suggest two features: inverse language model feature and collocation feature to improve the performance of our model. Experimental results show that our approach outperforms several sequence labeling baselines.

[1]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[2]  Mi Chenggan Recognition of Chinese Loan Words in Uyghur Based on String Similarity , 2013 .

[3]  Yulia Tsvetkov,et al.  Cross-Lingual Bridges with Models of Lexical Borrowing , 2016, J. Artif. Intell. Res..

[4]  Chen Ping A Comparison on the methods of Uyghur and Chinese Loan Words , 2011 .

[5]  Herbert Jaeger,et al.  A tutorial on training recurrent neural networks , covering BPPT , RTRL , EKF and the " echo state network " approach - Semantic Scholar , 2005 .

[6]  Yulia Tsvetkov,et al.  Lexicon Stratification for Translating Out-of-Vocabulary Words , 2015, ACL.

[7]  Yulia Tsvetkov,et al.  Constraint-Based Models of Lexical Borrowing , 2015, NAACL.

[8]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[9]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[10]  Dragos Stefan Munteanu,et al.  Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora , 2006, ACL.

[11]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[12]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[13]  Ming Zhou,et al.  A Recursive Recurrent Neural Network for Statistical Machine Translation , 2014, ACL.

[14]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[15]  Claire Cardie,et al.  Deep Recursive Neural Networks for Compositionality in Language , 2014, NIPS.

[16]  Shuly Wintner,et al.  Language Models for Machine Translation: Original vs. Translated Texts , 2011, CL.

[17]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[18]  Xiao Li,et al.  Detection of Loan Words in Uyghur Texts , 2014, NLPCC.

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.