Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages

Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision – e.g., cross-lingual transfer, type-level supervision, or a combination thereof – have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource languages, and the taggers use sources of information, like high-coverage and almost error-free dictionaries, which are likely not available for resource-poor languages. We train and evaluate state-of-the-art weakly supervised POS taggers for a typologically diverse set of 15 truly low-resource languages. On these languages, given a realistic amount of resources, even our best model gets only less than half of the words right. Our results highlight the need for new and different approaches to POS tagging for truly low-resource languages.

[1]  Isabelle Augenstein,et al.  Character-level Supervision for Low-resource POS Tagging , 2018, DeepLo@ACL.

[2]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[3]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[4]  Mark Steedman,et al.  Two Decades of Unsupervised POS Induction: How Far Have We Come? , 2010, EMNLP.

[5]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[6]  Kenneth Heafield,et al.  Incorporating Source Syntax into Transformer-Based Neural Machine Translation , 2019, WMT.

[7]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[8]  Anders Søgaard,et al.  Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[9]  Trevor Cohn,et al.  Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary , 2017, ACL.

[10]  Christo Kirov,et al.  Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms , 2016, LREC.

[11]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[12]  Anders Søgaard Data point selection for cross-language adaptation of dependency parsers , 2011, ACL.

[13]  Joachim Bingel,et al.  Latent Multi-Task Architecture Learning , 2017, AAAI.

[14]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[15]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[16]  Gonçalo Simões,et al.  Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings , 2018, ACL.

[17]  Rico Sennrich,et al.  Predicting Target Language CCG Supertags Improves Neural Machine Translation , 2017, WMT.

[18]  Yifan Gong,et al.  Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Daniel Zeman,et al.  Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL Shared Task.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Nizar Habash,et al.  CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL.

[22]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[23]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[24]  Soroush Vosoughi,et al.  Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder , 2016, SIGIR.

[25]  Mark Steedman,et al.  A Bayesian Mixture Model for PoS Induction Using Multiple Features , 2011, EMNLP.

[26]  Jason Baldridge,et al.  Learning a Part-of-Speech Tagger from Two Hours of Annotation , 2013, NAACL.

[27]  Ben Taskar,et al.  Wiki-ly Supervised Part-of-Speech Tagging , 2012, EMNLP.

[28]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[29]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[30]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[31]  Isabelle Augenstein,et al.  Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces , 2018, NAACL.

[32]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[33]  Anders Søgaard,et al.  Simple task-specific bilingual word embeddings , 2015, NAACL.

[34]  Barbara Plank,et al.  Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging , 2018, EMNLP.

[35]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[36]  Joachim Bingel,et al.  Identifying beneficial task relations for multi-task learning in deep neural networks , 2017, EACL.

[37]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[38]  François Yvon,et al.  Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning , 2014, EMNLP.

[39]  Hai Zhao,et al.  A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding , 2015, ArXiv.

[40]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[41]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[42]  Noah A. Smith,et al.  Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs , 2015, EMNLP.

[43]  Timothy Baldwin,et al.  PanLex and LEXTRACT: Translating all Words of all Languages of the World , 2010, COLING.

[44]  Robert Moore An Improved Tag Dictionary for Faster Part-of-Speech Tagging , 2015, EMNLP.

[45]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[46]  Christopher D. Manning,et al.  Cross-lingual Pseudo-Projected Expectation Regularization for Weakly Supervised Learning , 2013, ArXiv.

[47]  Dirk Hovy,et al.  If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages , 2015, ACL.

[48]  Timothy Baldwin,et al.  Language Identification: The Long and the Short of the Matter , 2010, NAACL.

[49]  Ryan Cotterell,et al.  One-Shot Neural Cross-Lingual Transfer for Paradigm Completion , 2017, ACL.

[50]  Joachim Bingel,et al.  Sluice networks: Learning what to share between loosely related tasks , 2017, ArXiv.

[51]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.