French Tweet Corpus for Automatic Stance Detection

The automatic stance detection task consists in determining the attitude expressed in a text toward a target (text, claim, or entity). This is a typical intermediate task for the fake news detection or analysis, which is a considerably widespread and a particularly difficult issue to overcome. This work aims at the creation of a human-annotated corpus for the automatic stance detection of tweets written in French. It exploits a corpus of tweets collected during July and August 2018. To the best of our knowledge, this is the first freely available stance annotated tweet corpus in the French language. The four classes broadly adopted by the community were chosen for the annotation: support, deny, query, and comment with the addition of the ignore class. This paper presents the corpus along with the tools used to build it, its construction, an analysis of the inter-rater reliability, as well as the challenges and questions that were raised during the building process.

[1]  R. Procter,et al.  Reading the riots on Twitter: methodological innovation for the analysis of big data , 2013 .

[2]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[3]  N. Newman,et al.  Reuters Institute Digital News Report 2019 , 2019 .

[4]  Dong Yu,et al.  BLCU_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation , 2019, *SEMEVAL.

[5]  Luk'avs Burget,et al.  BUT-FIT at SemEval-2019 Task 7: Determining the Rumour Stance with Pre-Trained Deep Bidirectional Transformers , 2019, *SEMEVAL.

[6]  Arkaitz Zubiaga,et al.  Detection and Resolution of Rumours in Social Media , 2017, ACM Comput. Surv..

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Isabelle Augenstein,et al.  Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM , 2017, *SEMEVAL.

[9]  Andreas Vlachos,et al.  Emergent: a novel data-set for stance classification , 2016, NAACL.

[10]  Justus J. Randolph Free-Marginal Multirater Kappa (multirater K[free]): An Alternative to Fleiss' Fixed-Marginal Multirater Kappa. , 2005 .

[11]  Samhaa R. El-Beltagy,et al.  NileTMRG at SemEval-2017 Task 8: Determining Rumour and Veracity Support for Rumours on Twitter. , 2017, *SEMEVAL.

[12]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[13]  Arkaitz Zubiaga,et al.  SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[14]  Céline Hudelot,et al.  Real-Time Collection of Reliable and Representative Tweets Datasets Related to News Events , 2018, BroDyn@ECIR.

[15]  David Lazer,et al.  ConStance: Modeling Annotation Contexts to Improve Stance Classification , 2017, EMNLP.

[16]  Nicolas Hervé,et al.  A French Corpus for Event Detection on Twitter , 2020, LREC.