论文信息 - Character-level Supervision for Low-resource POS Tagging - 字舞流文

Character-level Supervision for Low-resource POS Tagging

Neural part-of-speech (POS) taggers are known to not perform well with little training data. As a step towards overcoming this problem, we present an architecture for learning more robust neural POS taggers by jointly training a hierarchical, recurrent model and a recurrent character-based sequence-to-sequence network supervised using an auxiliary objective. This way, we introduce stronger character-level supervision into the model, which enables better generalization to unseen words and provides regularization, making our encoding less prone to overfitting. We experiment with three auxiliary tasks: lemmatization, character-based word autoencoding, and character-based random string autoencoding. Experiments with minimal amounts of labeled data on 34 languages show that our new architecture outperforms a single-task baseline and, surprisingly, that, on average, raw text autoencoding can be as beneficial for low-resource POS tagging as using lemma information. Our neural POS tagger closes the gap to a state-of-the-art POS tagger (MarMoT) for low-resource scenarios by 43%, even outperforming it on languages with templatic morphology, e.g., Arabic, Hebrew, and Turkish, by some margin.

Isabelle Augenstein | Katharina Kann | Barbara Plank | Anders Søgaard | Johannes Bjerva | Barbara Plank | Anders Søgaard | Isabelle Augenstein | Johannes Bjerva | Katharina Kann

[1] Oriol Vinyals,et al. Multilingual Language Processing From Bytes , 2015, NAACL.

[2] Joachim Bingel,et al. Identifying beneficial task relations for multi-task learning in deep neural networks , 2017, EACL.

[3] Eliyahu Kiperwasser,et al. Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[4] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5] Anders Søgaard. Data point selection for cross-language adaptation of dependency parsers , 2011, ACL.

[6] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[7] Ruslan Salakhutdinov,et al. Multi-Task Cross-Lingual Sequence Tagging from Scratch , 2016, ArXiv.

[8] Johan Bos,et al. Semantic Tagging with Deep Residual Networks , 2016, COLING.

[9] Andrew McCallum,et al. Reducing Weight Undertraining in Structured Discriminative Learning , 2006, NAACL.

[10] Katharina Kann,et al. Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection , 2016, ACL.

[11] Regina Barzilay,et al. Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[12] Barbara Plank,et al. When is multitask learning effective? Semantic sequence prediction under varying data conditions , 2016, EACL.

[13] Joakim Nivre,et al. Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[14] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15] Christopher D. Manning,et al. Cross-lingual Projected Expectation Regularization for Weakly Supervised Learning , 2014, TACL.

[16] Sebastian Riedel,et al. The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18] Simon Clematide,et al. Align and Copy: UZH at SIGMORPHON 2017 Shared Task for Morphological Reinflection , 2017, CoNLL.

[19] Guillaume Lample,et al. Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning , 2016, NAACL.

[20] Soroush Vosoughi,et al. Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder , 2016, SIGIR.

[21] Hinrich Schütze,et al. Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.

[22] Noah A. Smith,et al. Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs , 2015, EMNLP.

[23] Christopher D. Manning,et al. Cross-lingual Pseudo-Projected Expectation Regularization for Weakly Supervised Learning , 2013, ArXiv.

[24] Johannes Bjerva,et al. Will my auxiliary tagging task help? Estimating Auxiliary Tasks Effectivity in Multi-Task Learning , 2017, NODALIDA.

[25] Yonatan Belinkov,et al. Improving Sequence to Sequence Learning for Morphological Inflection Generation: The BIU-MIT Systems for the SIGMORPHON 2016 Shared Task for Morphological Reinflection , 2016, SIGMORPHON.

[26] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.

[27] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[28] Barbara Plank,et al. Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[29] Dirk Hovy,et al. If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages , 2015, ACL.

[30] Ryan Cotterell,et al. One-Shot Neural Cross-Lingual Transfer for Paradigm Completion , 2017, ACL.

[31] Joachim Bingel,et al. Sluice networks: Learning what to share between loosely related tasks , 2017, ArXiv.

[32] Johannes Bjerva,et al. One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis , 2017, ArXiv.

[33] Helmut Schmid,et al. Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[34] Ben Taskar,et al. Wiki-ly Supervised Part-of-Speech Tagging , 2012, EMNLP.

[35] Wang Ling,et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[36] Marek Rei,et al. Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[37] Yifan Gong,et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38] Anders Søgaard,et al. Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[39] Chris Biemann. Unsupervised Part-of-Speech Tagging , 2012 .

[40] Thomas L. Griffiths,et al. A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[41] Noah A. Smith,et al. Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance , 2011, EMNLP.

[42] Hai Zhao,et al. A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding , 2015, ArXiv.

[43] Noah A. Smith,et al. Many Languages, One Parser , 2016, TACL.

[44] Dianhai Yu,et al. Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[45] Jan Niehues,et al. Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[46] Isabelle Augenstein,et al. Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces , 2018, NAACL.

[47] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[48] David Yarowsky,et al. Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.