Tackling spam in the era of end-to-end encryption: A case study of WhatsApp

WhatsApp is a popular messaging app used by over a billion users around the globe. Due to this popularity, spam on WhatsApp is an important issue. Despite this, the distribution of spam via WhatsApp remains understudied by researchers, in part because of the end-toend encryption offered by the platform. This paper addresses this gap by studying spam on a dataset of 2.6 million messages sent to 5,051 public WhatsApp groups in India over 300 days. First, we characterise spam content shared within public groups and find that nearly 1 in 10 messages is spam. We observe a wide selection of topics ranging from job ads to adult content, and find that spammers post both URLs and phone numbers to promote material. Second, we inspect the nature of spammers themselves. We find that spam is often disseminated by groups of phone numbers, and that spam messages are generally shared for longer duration than non-spam messages. Finally, we devise content and activity based detection algorithms that can counter spam.

[1]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[2]  Richard J. Anderson,et al.  An assessment of SMS fraud in Pakistan , 2019, COMPASS.

[3]  David W. Schumann,et al.  Predicting the Effectiveness of Different Strategies of Advertising Variation: A Test of the Repetition-Variation Hypotheses , 1990 .

[4]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, CCS '10.

[5]  James Caverlee,et al.  Detecting Spam URLs in Social Media via Behavioral Analysis , 2015, ECIR.

[6]  Ankit Gupta,et al.  Good Morning Turning to Spam Morning , 2019, ICICCT 2019 – System Reliability, Quality Control, Safety, Maintenance and Management.

[7]  Sarit Kraus,et al.  WhatsApp usage patterns and prediction of demographic characteristics without access to message content , 2018, Demographic Research.

[8]  Barton C. Massey,et al.  Learning Spam: Simple Techniques For Freely-Available Software , 2003, USENIX Annual Technical Conference, FREENIX Track.

[9]  Serkan Balli,et al.  Development of content-based SMS classification application by using Word2Vec-based feature extraction , 2019, IET Softw..

[10]  Dean Eckles,et al.  Images and Misinformation in Political Groups: Evidence from WhatsApp in India , 2020, ArXiv.

[11]  Muhammad Ikram,et al.  The Chain of Implicit Trust: An Analysis of the Web Third-party Resources Loading , 2019, WWW.

[12]  Cao Xiao,et al.  Detecting Clusters of Fake Accounts in Online Social Networks , 2015, AISec@CCS.

[13]  Matt Jones,et al.  How WhatsApp Reduced Spam while Launching End-to-End Encryption , 2017 .

[14]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[15]  Calton Pu,et al.  Click traffic analysis of short URL spam on Twitter , 2013, 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[16]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[17]  Venkata Rama Kiran Garimella,et al.  Can WhatsApp Counter Misinformation by Limiting Message Forwarding? , 2019, COMPLEX NETWORKS.

[18]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[19]  Gordon V. Cormack,et al.  Email Spam Filtering: A Systematic Review , 2008, Found. Trends Inf. Retr..

[20]  Akebo Yamakami,et al.  Contributions to the study of SMS spam filtering: new collection and results , 2011, DocEng '11.

[21]  Elissa M. Redmiles,et al.  Examining the Demand for Spam: Who Clicks? , 2018, CHI.

[22]  Lawrence Birnbaum,et al.  Thousands of Small, Constant Rallies: A Large-Scale Analysis of Partisan WhatsApp Groups , 2019, 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[23]  B. Sternthal,et al.  Ease of message processing as a moderator of repetition effects in advertising. , 1990 .

[24]  Adam Chang,et al.  Networks in a World Unknown: Public WhatsApp Groups in the Venezuelan Refugee Crisis , 2020, ArXiv.

[25]  Fabrício Benevenuto,et al.  (Mis)Information Dissemination in WhatsApp: Gathering, Analyzing and Countermeasures , 2019, WWW.

[26]  P. Oscar Boykin,et al.  Leveraging social networks to fight spam , 2005, Computer.

[27]  Markus Strohmaier,et al.  Short links under attack: geographical analysis of spam in a URL shortener network , 2012, HT '12.

[28]  Julio C. S. Reis,et al.  Can WhatsApp benefit from debunked fact-checked stories to reduce misinformation? , 2020 .

[29]  Jamie De Guerre Vipul's Razor: The mechanics of Vipul's Razor technology , 2007 .

[30]  Xu An Wang,et al.  Intelligent SMS Spam Filtering Using Topic Model , 2016, 2016 International Conference on Intelligent Networking and Collaborative Systems (INCoS).

[31]  Virgílio A. F. Almeida,et al.  Detecting Spammers and Content Promoters in Online Video Social Networks , 2009, IEEE INFOCOM Workshops 2009.

[32]  Venkata Rama Kiran Garimella,et al.  WhatsApp, Doc? A First Look at WhatsApp Public Group Data , 2018, ICWSM 2018.

[33]  Junjie Zhang,et al.  Detecting fake anti-virus software distribution webpages , 2015, Comput. Secur..

[34]  Lincoln Mullen,et al.  textreuse: Detect Text Reuse and Document Similarity , 2015 .

[35]  Atreyee Dey,et al.  MuRIL: Multilingual Representations for Indian Languages , 2021, ArXiv.