Character-level Supervision for Low-resource POS Tagging

Neural part-of-speech (POS) taggers are known to not perform well with little training data. As a step towards overcoming this problem, we present an architecture for learning more robust neural POS taggers by jointly training a hierarchical, recurrent model and a recurrent character-based sequence-to-sequence network supervised using an auxiliary objective. This way, we introduce stronger character-level supervision into the model, which enables better generalization to unseen words and provides regularization, making our encoding less prone to overfitting. We experiment with three auxiliary tasks: lemmatization, character-based word autoencoding, and character-based random string autoencoding. Experiments with minimal amounts of labeled data on 34 languages show that our new architecture outperforms a single-task baseline and, surprisingly, that, on average, raw text autoencoding can be as beneficial for low-resource POS tagging as using lemma information. Our neural POS tagger closes the gap to a state-of-the-art POS tagger (MarMoT) for low-resource scenarios by 43%, even outperforming it on languages with templatic morphology, e.g., Arabic, Hebrew, and Turkish, by some margin.

[1]  Oriol Vinyals,et al.  Multilingual Language Processing From Bytes , 2015, NAACL.

[2]  Joachim Bingel,et al.  Identifying beneficial task relations for multi-task learning in deep neural networks , 2017, EACL.

[3]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[4]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5]  Anders Søgaard Data point selection for cross-language adaptation of dependency parsers , 2011, ACL.

[6]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[7]  Ruslan Salakhutdinov,et al.  Multi-Task Cross-Lingual Sequence Tagging from Scratch , 2016, ArXiv.

[8]  Johan Bos,et al.  Semantic Tagging with Deep Residual Networks , 2016, COLING.

[9]  Andrew McCallum,et al.  Reducing Weight Undertraining in Structured Discriminative Learning , 2006, NAACL.

[10]  Katharina Kann,et al.  Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection , 2016, ACL.

[11]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[12]  Barbara Plank,et al.  When is multitask learning effective? Semantic sequence prediction under varying data conditions , 2016, EACL.

[13]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Christopher D. Manning,et al.  Cross-lingual Projected Expectation Regularization for Weakly Supervised Learning , 2014, TACL.

[16]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Simon Clematide,et al.  Align and Copy: UZH at SIGMORPHON 2017 Shared Task for Morphological Reinflection , 2017, CoNLL.

[19]  Guillaume Lample,et al.  Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning , 2016, NAACL.

[20]  Soroush Vosoughi,et al.  Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder , 2016, SIGIR.

[21]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.

[22]  Noah A. Smith,et al.  Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs , 2015, EMNLP.

[23]  Christopher D. Manning,et al.  Cross-lingual Pseudo-Projected Expectation Regularization for Weakly Supervised Learning , 2013, ArXiv.

[24]  Johannes Bjerva,et al.  Will my auxiliary tagging task help? Estimating Auxiliary Tasks Effectivity in Multi-Task Learning , 2017, NODALIDA.

[25]  Yonatan Belinkov,et al.  Improving Sequence to Sequence Learning for Morphological Inflection Generation: The BIU-MIT Systems for the SIGMORPHON 2016 Shared Task for Morphological Reinflection , 2016, SIGMORPHON.

[26]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[27]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[28]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[29]  Dirk Hovy,et al.  If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages , 2015, ACL.

[30]  Ryan Cotterell,et al.  One-Shot Neural Cross-Lingual Transfer for Paradigm Completion , 2017, ACL.

[31]  Joachim Bingel,et al.  Sluice networks: Learning what to share between loosely related tasks , 2017, ArXiv.

[32]  Johannes Bjerva,et al.  One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis , 2017, ArXiv.

[33]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[34]  Ben Taskar,et al.  Wiki-ly Supervised Part-of-Speech Tagging , 2012, EMNLP.

[35]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[36]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[37]  Yifan Gong,et al.  Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  Anders Søgaard,et al.  Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[39]  Chris Biemann Unsupervised Part-of-Speech Tagging , 2012 .

[40]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[41]  Noah A. Smith,et al.  Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance , 2011, EMNLP.

[42]  Hai Zhao,et al.  A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding , 2015, ArXiv.

[43]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[44]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[45]  Jan Niehues,et al.  Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[46]  Isabelle Augenstein,et al.  Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces , 2018, NAACL.

[47]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[48]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.