Bertinho: Galician BERT Representations

This paper presents a monolingual BERT model for Galician. We follow the recent trend that shows that it is feasible to build robust monolingual BERT models even for relatively low-resource languages, while performing better than the well-known official multilingual BERT (mBERT). More particularly, we release two monolingual Galician BERT models, built using 6 and 12 transformer layers, respectively; trained with limited resources (∼45 million tokens on a single GPU of 24GB). We then provide an exhaustive evaluation on a number of tasks such as POS-tagging, dependency parsing and named entity recognition. For this purpose, all these tasks are cast in a pure sequence labeling setup in order to run BERT without the need to include any additional layers on top of it (we only use an output classification layer to map the contextualized representations into the predicted label). The experiments show that our models, especially the 12-layer one, outperform the results of mBERT in most tasks.

[1]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[2]  Vishrav Chaudhary,et al.  CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[3]  Xavier Gómez Guinovart,et al.  Developing New Linguistic Resources and Tools for the Galician Language , 2018, LREC.

[4]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Sampo Pyysalo,et al.  Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection , 2020, LREC.

[7]  Ion Androutsopoulos,et al.  GREEK-BERT: The Greeks visiting Sesame Street , 2020, SETN.

[8]  X. R. F. Mato Gramática da lingua galega IV: Gramática do texto , 2003 .

[9]  Goran Glavas,et al.  Probing Pretrained Language Models for Lexical Semantics , 2020, EMNLP.

[10]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[11]  Dan Klein,et al.  Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[12]  Xavier Gómez Guinovart Recursos integrados da lingua galega para a investigación lingüística , 2017 .

[13]  Thorsten Joachims,et al.  Evaluation methods for unsupervised word embeddings , 2015, EMNLP.

[14]  Pablo Gamallo,et al.  Análise Morfossintáctica para Português Europeu e Galego: Problemas, Soluções e Avaliação , 2010, Linguamática.

[15]  Xavier Gómez Guinovart,et al.  Anotación morfosintáctica do Corpus Técnico do Galego , 2009, Linguamática.

[16]  Mark Dredze,et al.  Are All Languages Created Equal in Multilingual BERT? , 2020, REPL4NLP.

[17]  Jonathan Berant,et al.  Contextualized Word Representations for Reading Comprehension , 2017, NAACL.

[18]  Celso Ferreira da Cunha,et al.  Nova gramática do português contemporâneo , 1985 .

[19]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[20]  José Ramom Pichel Campos,et al.  Vencendo a escassez de recursos computacionais. Carvalho: Tradutor Automático Estatístico Inglês-Galego a partir do corpus paralelo Europarl Inglês-Português , 2010, Linguamática.

[21]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[22]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[23]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[24]  Mikhail Arkhipov,et al.  Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language , 2019, ArXiv.

[25]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[26]  Laurent Romary,et al.  A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages , 2020, ACL.

[27]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[28]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[29]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[30]  Marie-Catherine de Marneffe,et al.  Evaluating BERT for natural language inference: A case study on the CommitmentBank , 2019, EMNLP.

[31]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[32]  David Vilares,et al.  Parsing as Pretraining , 2020, AAAI.

[33]  David Vilares,et al.  Viable Dependency Parsing as Sequence Labeling , 2019, NAACL.

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[36]  Eneko Agirre,et al.  Give your Text Representation Models some Love: the Case for Basque , 2020, LREC.

[37]  Miguel A. Alonso,et al.  New treebank or repurposed? On the feasibility of cross-lingual parsing of Romance languages with Universal Dependencies† , 2017, Natural Language Engineering.

[38]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[39]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[40]  Robert Frank,et al.  Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.

[41]  Carlos Gómez-Rodríguez,et al.  Creación de un treebank de dependencias universales mediante recursos existentes para lenguas próximas: el caso del gallego , 2016, Proces. del Leng. Natural.

[42]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[43]  Allyson Ettinger,et al.  What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.

[44]  Rodrigo Nogueira,et al.  Portuguese Named Entity Recognition using BERT-CRF , 2019, ArXiv.

[45]  David Vilares,et al.  Transition-based Parsing with Lighter Feed-Forward Networks , 2018, UDW@EMNLP.

[46]  Dan Roth,et al.  Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.

[47]  Sangah Lee,et al.  KR-BERT: A Small-Scale Korean-Specific Language Model , 2020, 2008.03979.

[48]  Michael Ramscar,et al.  Testing the Distributioanl Hypothesis: The influence of Context on Judgements of Semantic Similarity , 2001 .