Detecting Clickbait in Online Social Media: You Won't Believe How We Did It

In this paper, we propose an approach for the detection of clickbait posts in online social media (OSM). Clickbait posts are short catchy phrases that attract a user's attention to click to an article. The approach is based on a machine learning (ML) classifier capable of distinguishing between clickbait and legitimate posts published in OSM. The suggested classifier is based on a variety of features, including image related features, linguistic analysis, and methods for abuser detection. In order to evaluate our method, we used two datasets provided by Clickbait Challenge 2017. The best performance obtained by the ML classifier was an AUC of 0.8, an accuracy of 0.812, precision of 0.819, and recall of 0.966. In addition, as opposed to previous studies, we found that clickbait post titles are statistically significant shorter than legitimate post titles. Finally, we found that counting the number of formal English words in the given content is useful for clickbait detection.

[1]  Prakhar Biyani,et al.  "8 Amazing Secrets for Getting More Clicks": Detecting Clickbaits in News Streams Using Article Informality , 2016, AAAI.

[2]  Daoud M. Daoud,et al.  Clickbait Detection , 2018, ICSIE '18.

[3]  Adrian Letchford,et al.  The advantage of short paper titles , 2015, Royal Society Open Science.

[4]  Matthias Hagen,et al.  Clickbait Detection , 2016, ECIR.

[5]  Adrian Letchford,et al.  The advantage of simple paper abstracts , 2016, J. Informetrics.

[6]  Eugenio Tacchini,et al.  Some Like it Hoax: Automated Fake News Detection in Social Networks , 2017, ArXiv.

[7]  V. S. Subrahmanian,et al.  Using sentiment to detect bots on Twitter: Are humans more opinionated than bots? , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[8]  Joachim Meyer,et al.  User attitudes towards news content personalization , 2010, Int. J. Hum. Comput. Stud..

[9]  Wolfram Burgard,et al.  Information Gain-based Exploration Using Rao-Blackwellized Particle Filters , 2005, Robotics: Science and Systems.

[10]  Gang Wang,et al.  Serf and turf: crowdturfing for fun and profit , 2011, WWW.

[11]  Rami Puzis,et al.  Measurement of Online Discussion Authenticity within Online Social Media , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[12]  Sibel Adali,et al.  This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[13]  G. Loewenstein The psychology of curiosity: A review and reinterpretation. , 1994 .

[14]  Eugene E Guang Tan,et al.  Clickbait: Fake News and Role of the State , 2017 .

[15]  Benno Stein,et al.  Improving the Reproducibility of PAN's Shared Tasks: - Plagiarism Detection, Author Identification, and Author Profiling , 2014, CLEF.

[16]  Michael Sirivianos,et al.  Aiding the Detection of Fake Accounts in Large Scale Social Online Services , 2012, NSDI.

[17]  Yimin Chen,et al.  Misleading Online Content: Recognizing Clickbait as "False News" , 2015, WMDD@ICMI.

[18]  Kyumin Lee,et al.  Crowdturfers, Campaigns, and Social Media: Tracking and Revealing Crowdsourced Manipulation of Social Media , 2013, ICWSM.

[19]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[20]  J. N. Blom,et al.  Click bait: Forward-reference as lure in online news headlines , 2015 .

[21]  Bram Vijgen,et al.  THE LISTICLE: AN EXPLORING RESEARCH ON AN INTERESTING SHAREABLE NEW MEDIA PHENOMENON , 2014 .

[22]  Jong Kim,et al.  CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks , 2015, CCS.

[23]  Matthias Hagen,et al.  The Clickbait Challenge 2017: Towards a Regression Model for Clickbait Strength , 2018, ArXiv.

[24]  Ullrich K. H. Ecker,et al.  The effects of subtle misinformation in news headlines. , 2014, Journal of experimental psychology. Applied.

[25]  Matthias Hagen,et al.  Crowdsourcing a Large Corpus of Clickbait on Twitter , 2018, COLING.

[26]  Abhijnan Chakraborty,et al.  Tabloids in the Era of Social Media? Understanding the Production and Consumption of Clickbaits in Twitter , 2017 .

[27]  Niloy Ganguly,et al.  Stop Clickbait: Detecting and preventing clickbaits in online news media , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).