Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources

Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables knowledge transfer from other languages, and private BLSTMs for language-specific representations. The cross-lingual model is trained with language-adversarial training and bidirectional language modeling as auxiliary objectives to better represent language-general information while not losing the information about a specific target language. Evaluating on POS datasets from 14 languages in the Universal Dependencies corpus, we show that the proposed transfer learning model improves the POS tagging performance of the target languages without exploiting any linguistic knowledge between the source language and the target language.

[1]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[2]  Young-Bum Kim,et al.  Adversarial Adaptation of Synthetic or Stale Data , 2017, ACL.

[3]  Joo-Kyung Kim,et al.  Linguistic Knowledge Transfer for Enriching Vector Representations , 2017 .

[4]  Young-Bum Kim,et al.  Frustratingly Easy Neural Domain Adaptation , 2016, COLING.

[5]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[6]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[7]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[8]  Young-Bum Kim,et al.  Domainless Adaptation by Constrained Decoding on a Schema Lattice , 2016, COLING.

[9]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[10]  Claire Cardie,et al.  Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[11]  Frank Keller,et al.  Cross-lingual Transfer of Correlations between Parts of Speech and Gaze Features , 2016, COLING.

[12]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[13]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[14]  Young-Bum Kim,et al.  Domain Attention with an Ensemble of Experts , 2017, ACL.

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Timothy Baldwin,et al.  Named Entity Recognition for Novel Types by Transfer Learning , 2016, EMNLP.

[17]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[18]  Pavel Pecina,et al.  Simpler unsupervised POS tagging with bilingual projections , 2013, ACL.

[19]  Regina Barzilay,et al.  Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings , 2016, NAACL.

[20]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[21]  Young-Bum Kim,et al.  New Transfer Learning Techniques for Disparate Label Sets , 2015, ACL.

[22]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[23]  Dilek Z. Hakkani-Tür,et al.  Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Young-Bum Kim,et al.  Pre-training of Hidden-Unit CRFs , 2015, ACL.

[25]  Young-Bum Kim,et al.  Unsupervised Consonant-Vowel Prediction over Hundreds of Languages , 2013, ACL.

[26]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[27]  Veronika Laippala,et al.  Universal Dependencies 1.4 , 2015 .

[28]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[29]  Chris Brew,et al.  A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources , 2004, EMNLP.

[30]  Young-Bum Kim,et al.  A Framework for pre-training hidden-unit conditional random fields and its extension to long short term memory networks , 2017, Comput. Speech Lang..

[31]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[32]  Young-Bum Kim,et al.  Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets , 2012, EMNLP.

[33]  Young-Bum Kim,et al.  Universal Morphological Analysis using Structured Nearest Neighbor Prediction , 2011, EMNLP.

[34]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[35]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[38]  Young-Bum Kim,et al.  Natural Language Model Re-usability for Scaling to Different Domains , 2016, EMNLP.

[39]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[40]  Young-Bum Kim,et al.  Part-of-speech Taggers for Low-resource Languages using CCA Features , 2015, EMNLP.

[41]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[42]  François Yvon,et al.  Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning , 2014, EMNLP.

[43]  Young-Bum Kim,et al.  Weakly Supervised Slot Tagging with Partially Labeled Sequences from Web Search Click Logs , 2015, NAACL.