How to Parse a Creole: When Martinican Creole Meets French

We investigate methods to develop a parser for Martinican Creole, a highly under-resourced language, using a French treebank. We compare transfer learning and multi-task learning models and examine different input features and strategies to handle the massive size imbalance between the treebanks. Surprisingly, we find that a simple concatenated (French + Martinican Creole) baseline yields optimal results even though it has access to only 80 Martinican Creole sentences. POS embeddings work better than lexical ones, but they suffer from negative transfer.

[1]  Anders Sogaard,et al.  On Language Models for Creoles , 2021, CONLL.

[2]  Carlos G'omez-Rodr'iguez,et al.  On the Frailty of Universal POS Tags for Neural UD Parsers , 2020, CONLL.

[3]  Min Zhang,et al.  Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing? , 2020, NLPCC.

[4]  B. Lecouteux,et al.  FlauBERT: Unsupervised Language Model Pre-training for French , 2019, LREC.

[5]  Laurent Romary,et al.  CamemBERT: a Tasty French Language Model , 2019, ACL.

[6]  Adam Lopez,et al.  A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages , 2019, EMNLP/IJCNLP.

[7]  Sylvain Kahane,et al.  A Surface-Syntactic UD Treebank for Naija , 2019, Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019).

[8]  Martin Potthast,et al.  CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2018, CoNLL.

[9]  Iryna Gurevych,et al.  The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation , 2018, COLING.

[10]  Alice Millour,et al.  Krik: First Steps into Crowdsourcing POS tags for Kréyòl Gwadloupéyen , 2018 .

[11]  Riyaz Ahmad Bhat,et al.  Universal Dependency Parsing for Hindi-English Code-Switching , 2018, NAACL.

[12]  Yue Zhang,et al.  Universal Dependencies Parsing for Colloquial Singaporean English , 2017, ACL.

[13]  Erik Velldal,et al.  Joint UD Parsing of Norwegian Bokmål and Nynorsk , 2017, NODALIDA.

[14]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[15]  Barbara Plank,et al.  Multilingual Projection for Parsing Truly Low-Resource Languages , 2016, TACL.

[16]  William Lewis,et al.  Haitian Creole: How to Build and Ship an MT Engine from Scratch in 4 days, 17 hours, & 30 minutes , 2010, EAMT.

[17]  Sandra Kübler,et al.  Bidirectional Domain Adaptation Using Weighted Multi-Task Learning , 2021, IWPT.

[18]  Daniel Dakota,et al.  Annotations Matter: Leveraging Multi-task Learning to Parse UD and SUD , 2021, FINDINGS.

[19]  Joakim Nivre,et al.  How to Parse Low-Resource Languages: Cross-Lingual Parsing, Target Language Annotation, or Both? , 2019, Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019).

[20]  Marie-Catherine de Marneffe,et al.  Conversion et améliorations de corpus du français annotés en Universal Dependencies [Conversion and Improvement of Universal Dependencies French corpora] , 2019, ICON.

[21]  Francis M. Tyers,et al.  UD Annotatrix: An annotation tool for Universal Dependencies , 2018, TLT.

[22]  Yao Cheng,et al.  Combining Global Models for Parsing Universal Dependencies , 2017, CoNLL.

[23]  Wanxiang Che,et al.  The HIT-SCIR System for End-to-End Parsing of Universal Dependencies , 2017, CoNLL Shared Task.

[24]  Sudeshna Sarkar,et al.  Delexicalized transfer parsing for low-resource languages using transformed and combined treebanks , 2017, CoNLL.

[25]  Xiang Yu,et al.  IMS at the CoNLL 2017 UD Shared Task: CRFs and Perceptrons Meet Neural Networks , 2017, CoNLL.

[26]  Marie-Christine Hazaël-Massieux,et al.  Les créoles à base française : une introduction , 2002 .

[27]  Groupe d'études et de recherches en espace créolophone Charte culturelle créole : se pwan douvan avan douvan pwan nou! , 1982 .