Identifying Clickbait: A Multi-Strategy Approach Using Neural Networks

Online media outlets, in a bid to expand their reach and subsequently increase revenue through ad monetisation, have begun adopting clickbait techniques to lure readers to click on articles. The article fails to fulfill the promise made by the headline. Traditional methods for clickbait detection have relied heavily on feature engineering which, in turn, is dependent on the dataset it is built for. The application of neural networks for this task has only been explored partially. We propose a novel approach considering all information found in a social media post. We train a bidirectional LSTM with an attention mechanism to learn the extent to which a word contributes to the post's clickbait score in a differential manner. We also employ a Siamese net to capture the similarity between source and target information. Information gleaned from images has not been considered in previous approaches. We learn image embeddings from large amounts of data using Convolutional Neural Networks to add another layer of complexity to our model. Finally, we concatenate the outputs from the three separate components, serving it as input to a fully connected layer. We conduct experiments over a test corpus of 19538 social media posts, attaining an F1 score of 65.37% on the dataset bettering the previous state-of-the-art, as well as other proposed approaches, feature engineering or otherwise.

[1]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[2]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[3]  G. Loewenstein The psychology of curiosity: A review and reinterpretation. , 1994 .

[4]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[5]  Maarten Versteegh,et al.  Learning Text Similarity with Siamese Recurrent Networks , 2016, Rep4NLP@ACL.

[6]  Matthias Hagen,et al.  Crowdsourcing a Large Corpus of Clickbait on Twitter , 2018, COLING.

[7]  Matthias Hagen,et al.  Clickbait Detection , 2016, ECIR.

[8]  Niloy Ganguly,et al.  Stop Clickbait: Detecting and preventing clickbaits in online news media , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[9]  Tanmoy Chakraborty,et al.  We Used Neural Networks to Detect Clickbaits: You Won't Believe What Happened Next! , 2016, ECIR.

[10]  David S. Touretzky,et al.  Advances in neural information processing systems 2 , 1989 .

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[13]  Philippe Thomas Clickbait Identification using Neural Networks , 2017, ArXiv.

[14]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[15]  Yiwei Zhou,et al.  Clickbait Detection in Tweets Using Self-attentive Network , 2017, ArXiv.

[16]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Daoud M. Daoud,et al.  Clickbait Detection , 2018, ICSIE '18.

[19]  Prakhar Biyani,et al.  "8 Amazing Secrets for Getting More Clicks": Detecting Clickbaits in News Streams Using Article Informality , 2016, AAAI.

[20]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[21]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.