GreekBART: The First Pretrained Greek Sequence-to-Sequence Model

The era of transfer learning has revolutionized the fields of Computer Vision and Natural Language Processing, bringing powerful pretrained models with exceptional performance across a variety of tasks. Specifically, Natural Language Processing tasks have been dominated by transformer-based language models. In Natural Language Inference and Natural Language Generation tasks, the BERT model and its variants, as well as the GPT model and its successors, demonstrated exemplary performance. However, the majority of these models are pretrained and assessed primarily for the English language or on a multilingual corpus. In this paper, we introduce GreekBART, the first Seq2Seq model based on BART-base architecture and pretrained on a large-scale Greek corpus. We evaluate and compare GreekBART against BART-random, Greek-BERT, and XLM-R on a variety of discriminative tasks. In addition, we examine its performance on two NLG tasks from GreekSUM, a newly introduced summarization dataset for the Greek language. The model, the code, and the new summarization dataset will be publicly available.

[1]  Jorge P'erez,et al.  Spanish Pre-trained BERT Model and Evaluation Data , 2023, ArXiv.

[2]  Moussa Kamal Eddine,et al.  AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization , 2022, WANLP.

[3]  Laurent Romary,et al.  Towards a Cleaner Document-Oriented Multilingual Crawled Corpus , 2022, LREC.

[4]  L. Hurtado,et al.  NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish , 2021, Applied Sciences.

[5]  Nizar Habash,et al.  The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models , 2021, WANLP.

[6]  M. Vazirgiannis,et al.  An Ensemble Method for Producing Word Representations focusing on the Greek Language , 2020, LORESMT.

[7]  Moussa Kamal Eddine,et al.  BARThez: a Skilled Pretrained French Sequence-to-Sequence Model , 2020, Conference on Empirical Methods in Natural Language Processing.

[8]  Yannis Tzitzikas,et al.  NLP for the Greek Language: A Brief Survey , 2020, SETN.

[9]  Ion Androutsopoulos,et al.  GREEK-BERT: The Greeks visiting Sesame Street , 2020, SETN.

[10]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[11]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[12]  Laurent Romary,et al.  CamemBERT: a Tasty French Language Model , 2019, ACL.

[13]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[14]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[15]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[16]  Michalis Vazirgiannis,et al.  Word Embeddings from Large-Scale Greek Web Content , 2018, ArXiv.

[17]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[18]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[19]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[20]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[21]  Jordan J. Louviere,et al.  Best-Worst Scaling: Theory, Methods and Applications , 2015 .

[22]  Alexandra Birch,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[23]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[27]  Mamoru Komachi,et al.  TMU NMT System with Japanese BART for the Patent task of WAT 2021 , 2021, WAT.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[30]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[31]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.