Detecting clickbaits using two-phase hybrid CNN-LSTM biterm model

Abstract Clickbait indicates the type of content with an intending goal to attract the attention of readers. It has grown to become a nuisance to social media users. The purpose of clickbait is to bring an appealing link in front of users. Clickbaits seen in the form of headlines influence people to get attracted and curious to read the inside content. The content seen in the form of text on clickbait posts is very short to identify its features as clickbait. In this paper, a novel approach (two-phase hybrid CNN-LSTM Biterm model) has been proposed for modeling short topic content. The hybrid CNN-LSTM model when implemented with pre-trained GloVe embedding yields the best results based on accuracy, recall, precision, and F1-score performance metrics. The proposed model achieves 91.24%, 95.64%, 95.87% precision values for Dataset 1, Dataset 2 and Dataset 3, respectively. Eight types of clickbait such as Reasoning, Number, Reaction, Revealing, Shocking/Unbelievable, Hypothesis/Guess, Questionable, Forward referencing are classified in this work using the Biterm Topic Model (BTM). It has been shown that the clickbaits such as Shocking/Unbelievable, Hypothesis/Guess and Reaction are the highest in numbers among rest of the clickbait headlines published online. Also, a ground dataset of non-textual (image-based) data using multiple social media platforms has been created in this paper. The textual information has been retrieved from the images with the help of OCR tool. A comparative study is performed to show the effectiveness of our proposed model which helps to identify the various categories of clickbait headlines that are spread on social media platforms.

[1]  Eugenio Tacchini,et al.  Some Like it Hoax: Automated Fake News Detection in Social Networks , 2017, ArXiv.

[2]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[3]  Nicola Ferro,et al.  Advances in Information Retrieval - 38th European Conference on IR Research, ECIR 2016, Padua, Italy, March 20-23, 2016. Proceedings , 2016, European Conference on Information Retrieval.

[4]  Sibel Adali,et al.  This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[5]  Ponnurangam Kumaraguru,et al.  Automating fake news detection system using multi-level voting model , 2019, Soft Computing.

[6]  Jong Kim,et al.  CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks , 2015, CCS.

[7]  Eugene E Guang Tan,et al.  Clickbait: Fake News and Role of the State , 2017 .

[8]  Rami Puzis,et al.  Detecting Clickbait in Online Social Media: You Won't Believe How We Did It , 2017, ArXiv.

[9]  Abhijnan Chakraborty,et al.  Tabloids in the Era of Social Media? Understanding the Production and Consumption of Clickbaits in Twitter , 2017 .

[10]  Trupti M. Kodinariya,et al.  Review on determining number of Cluster in K-Means Clustering , 2013 .

[11]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[12]  Tatiana Litvinova,et al.  Machine Learning Models of Text Categorization by Author Gender Using Topic-independent Features , 2016 .

[13]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[14]  Niloy Ganguly,et al.  Stop Clickbait: Detecting and preventing clickbaits in online news media , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[15]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[16]  Matthias Hagen,et al.  Crowdsourcing a Large Corpus of Clickbait on Twitter , 2018, COLING.

[17]  Niloy Ganguly,et al.  Tabloids in the Era of Social Media? , 2017, Proc. ACM Hum. Comput. Interact..

[18]  Antonio Puliafito,et al.  Using Google Cloud Vision in assistive technology scenarios , 2016, 2016 IEEE Symposium on Computers and Communication (ISCC).

[19]  C. K. Kumbharana,et al.  Comparative Study of Character Recognition Tools , 2015 .

[20]  Dilip Singh Sisodia Ensemble Learning Approach for Clickbait Detection Using Article Headline Features , 2019, Informing Sci. Int. J. an Emerg. Transdiscipl..

[21]  J. N. Blom,et al.  Click bait: Forward-reference as lure in online news headlines , 2015 .

[22]  Amol Agrawal,et al.  Clickbait detection using deep learning , 2016, 2016 2nd International Conference on Next Generation Computing Technologies (NGCT).

[23]  Akshay Kulkarni,et al.  Exploring and Processing Text Data , 2019 .

[24]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[25]  Yimin Chen,et al.  Misleading Online Content: Recognizing Clickbait as "False News" , 2015, WMDD@ICMI.

[26]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[27]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[28]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[29]  Zhiyong Luo,et al.  Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts , 2016, COLING.

[30]  Jason Zhang,et al.  Machine Learning Based Detection of Clickbait Posts in Social Media , 2017, ArXiv.

[31]  Matthias Hagen,et al.  Clickbait Detection , 2016, ECIR.

[32]  Bram Vijgen,et al.  THE LISTICLE: AN EXPLORING RESEARCH ON AN INTERESTING SHAREABLE NEW MEDIA PHENOMENON , 2014 .

[33]  Juan M. Corchado,et al.  Hybridizing metric learning and case-based reasoning for adaptable clickbait detection , 2017, Applied Intelligence.

[34]  Naeemul Hassan,et al.  Diving Deep into Clickbaits: Who Use Them to What Extents in Which Topics with What Effects? , 2017, ASONAM.

[35]  Prakhar Biyani,et al.  "8 Amazing Secrets for Getting More Clicks": Detecting Clickbaits in News Streams Using Article Informality , 2016, AAAI.

[36]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[37]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[38]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.