ur-iw-hnt at CheckThat!-2022: Cross-lingual Text Summarization for Fake News Detection

We describe our submission to the CLEF CheckThat! 2022 challenge. We contributed to Tasks 3A and 3B – multiclass fake news classification in English and German, respectively. Our approach incorporates extractive and abstractive summarization techniques by utilizing fine-tuned DistilBART and T5-3B. For cross-linguality, we use automatic machine translation to improve model inference. Our approved run for Task 3B was the official winner according to both F1 and Accuracy, with a fair margin to the second place. For Task 3A, we describe a wide range of models that we experimented with. While only one submission per team was permitted, we also describe the non-submitted setup that tops the leaderboard performance in this task.

[1]  Udo Kruschwitz,et al.  Applying Automatic Text Summarization for Fake News Detection , 2022, LREC.

[2]  Udo Kruschwitz,et al.  ur-iw-hnt at GermEval 2021: An Ensembling Strategy with Multiple BERT Models , 2021, GERMEVAL.

[3]  Myle Ott,et al.  Larger-Scale Transformers for Multilingual Masked Language Modeling , 2021, REPL4NLP.

[4]  Paolo Papotti,et al.  Automated Fact-Checking for Assisting Human Fact-Checkers , 2021, IJCAI.

[5]  Alexander M. Rush,et al.  Pre-trained Summarization Distillation , 2020, ArXiv.

[6]  Gautam Kishore Shahi AMUSED: An Annotation Framework of Multi-modal Social Media Data , 2020, ArXiv.

[7]  M. Zaheer,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[8]  Durgesh Nandini,et al.  FakeCovid - A Multilingual Cross-domain Fact Check News Dataset for COVID-19 , 2020, ICWSM Workshops.

[9]  Tim A. Majchrzak,et al.  An exploratory study of COVID-19 misinformation on Twitter , 2020, Online Social Networks and Media.

[10]  Peter J. Liu,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[11]  Juri Opitz,et al.  Macro F1 and Macro F1 , 2019, ArXiv.

[12]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[13]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[14]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[15]  Derek Miller,et al.  Leveraging BERT for Extractive Text Summarization on Lectures , 2019, ArXiv.

[16]  Udo Kruschwitz,et al.  Improving Hate Speech Detection with Deep Learning Ensembles , 2018, LREC.

[17]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[20]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[21]  Julia Maria Struß,et al.  Overview of the CLEF-2022 CheckThat! Lab: Task 3 on Fake News Detection , 2022, CLEF.

[22]  Thomas Mandl,et al.  Overview of the CLEF-2021 CheckThat! Lab: Task 3 on Fake News Detection , 2021, CLEF.

[23]  Udo Kruschwitz,et al.  University of Regensburg at CheckThat! 2021: Exploring Text Summarization for Fake News Detection , 2021, CLEF.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Trupti M. Kodinariya,et al.  Review on determining number of Cluster in K-Means Clustering , 2013 .

[26]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .