论文信息 - A Neural Network Based Model for Loanword Identification in Uyghur

A Neural Network Based Model for Loanword Identification in Uyghur

Lexical borrowing happens in almost all languages. To obtain more bilingual knowledge from monolingual corpora, we propose a neural network based loanword identification model for Uyghur. We build our model on a bidirectional LSTM CNN framework, which can capture past and future information effectively and learn both word level and character level features from training data automatically. To overcome data sparsity that exists in model training, we also suggest three additional features , such as hybrid language model feature, pronunciation similarity feature and part-of-speech tagging feature to further improve the performance of our proposed approach. We conduct experiments on Chinese, Arabic and Russian loanword detection in Uyghur. Experimental results show that our proposed method outperforms several baseline models.

Tonghai Jiang | Lei Wang | Yating Yang | Chenggang Mi | Xi Zhou

[1] Yulia Tsvetkov,et al. Lexicon Stratification for Translating Out-of-Vocabulary Words , 2015, ACL.

[2] Eric Nichols,et al. Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[3] Yulia Tsvetkov,et al. Constraint-Based Models of Lexical Borrowing , 2015, NAACL.

[4] Chen Ping. A Comparison on the methods of Uyghur and Chinese Loan Words , 2011 .

[5] Yulia Tsvetkov,et al. Cross-Lingual Bridges with Models of Lexical Borrowing , 2016, J. Artif. Intell. Res..

[6] Mi Chenggan. Recognition of Chinese Loan Words in Uyghur Based on String Similarity , 2013 .

[7] Tonghai Jiang,et al. Recurrent Neural Network Based Loanwords Identification in Uyghur , 2016, PACLIC.

[8] Xiao Li,et al. Detection of Loan Words in Uyghur Texts , 2014, NLPCC.