NLP-based Feature Extraction for the Detection of COVID-19 Misinformation Videos on YouTube

We present a simple NLP methodology for detecting COVID-19 misinformation videos on YouTube by leveraging user comments. We use transfer-learning pre-trained models to generate a multi-label classifier that can categorize conspiratorial content. We use the percentage of misinformation comments on each video as a new feature for video classification. We show that the inclusion of this feature in simple models yields an accuracy of up to 82.2%. Furthermore, we verify the significance of the feature by performing a Bayesian analysis. Finally, we show that adding the first hundred comments as tf-idf features increases the video classifier accuracy by up to 89.4%.

[1]  Piroska Lendvai,et al.  Contradiction Detection for Rumorous Claims , 2016, ArXiv.

[2]  A. Gelman,et al.  Rank-normalization, folding, and localization: An improved R-hat for assessing convergence Rank-Normalization, Folding, and Localization: An Improved (cid:2) R for Assessing Convergence of MCMC An assessing for assessing An improved (cid:2) R for assessing convergence of MCMC , 2020 .

[3]  Emilio Ferrara,et al.  What types of COVID-19 conspiracies are populated by Twitter bots? , 2020, First Monday.

[4]  Mehmet Fatih Çömlekçi Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions that Shape Social Media , 2019 .

[5]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[6]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[7]  Sertan Girgin,et al.  Text based user comments as a signal for automatic language identification of online videos , 2017, ICMI.

[8]  Christo Wilson,et al.  Modeling and Measuring Expressed (Dis)belief in (Mis)information , 2020, ICWSM.

[9]  Young Bin Kim,et al.  Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies , 2016, PloS one.

[10]  Gerhard Weikum,et al.  DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning , 2018, EMNLP.

[11]  Wen Li,et al.  Exploiting User Comments for Audio-Visual Content Indexing and Retrieval , 2013, ECIR.

[12]  Jungwoo Kim,et al.  The politics of comments: predicting political orientation of news stories with commenters' sentiment patterns , 2011, CSCW.

[13]  Thomas A. Runkler,et al.  Neural Architectures for Fine-Grained Propaganda Detection in News , 2019, EMNLP.

[14]  Massimo Di Pierro,et al.  Automatic Online Fake News Detection Combining Content and Social Signals , 2018, 2018 22nd Conference of Open Innovations Association (FRUCT).

[15]  Iryna Gurevych,et al.  A Retrospective Analysis of the Fake News Challenge Stance-Detection Task , 2018, COLING.

[16]  Keith B. Hall,et al.  Improved video categorization from text metadata and user comments , 2011, SIGIR '11.

[17]  Jeremy Blackburn,et al.  The Pushshift Reddit Dataset , 2020, ICWSM.

[18]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[19]  Hsinchun Chen,et al.  Text‐based video content classification for online video‐sharing sites , 2010, J. Assoc. Inf. Sci. Technol..

[20]  Reza Zafarani,et al.  Fake News: A Survey of Research, Detection Methods, and Opportunities , 2018, ArXiv.

[21]  Claire Cardie,et al.  Properties, Prediction, and Prevalence of Useful User-Generated Comments for Descriptive Annotation of Social Media Objects , 2013, ICWSM.

[22]  Kristina Lerman,et al.  COVID-19: The First Public Coronavirus Twitter Dataset , 2020, ArXiv.

[23]  Kathleen M. Carley,et al.  Tree LSTMs with Convolution Units to Predict Stance and Rumor Veracity in Social Media Conversations , 2019, ACL.

[24]  Heidi Oi-Yee Li,et al.  YouTube as a source of information on COVID-19: a pandemic of misinformation? , 2020, BMJ Global Health.

[25]  Svitlana Volkova,et al.  Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter , 2017, ACL.

[26]  Verónica Pérez-Rosas,et al.  Automatic Detection of Fake News , 2017, COLING.

[27]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Victoria L. Rubin,et al.  Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News , 2016 .

[30]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[31]  Luo Si,et al.  Rumor Detection on Social Media: Datasets, Methods and Opportunities , 2019, EMNLP.

[32]  Andreas Vlachos,et al.  Automated Fact Checking: Task Formulations, Methods and Future Directions , 2018, COLING.

[33]  Fan Yang,et al.  Attending Sentences to detect Satirical Fake News , 2018, COLING.

[34]  Sarah T. Roberts,et al.  Behind the Screen , 2019 .

[35]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[36]  Christo Wilson,et al.  Linguistic Signals under Misinformation and Fact-Checking , 2018, Proc. ACM Hum. Comput. Interact..