The Good, the Bad and the Bait: Detecting and Characterizing Clickbait on YouTube

The use of deceptive techniques in user-generated video portals is ubiquitous. Unscrupulous uploaders deliberately mislabel video descriptors aiming at increasing their views and subsequently their ad revenue. This problem, usually referred to as "clickbait," may severely undermine user experience. In this work, we study the clickbait problem on YouTube by collecting metadata for 206k videos. To address it, we devise a deep learning model based on variational autoencoders that supports the diverse modalities of data that videos include. The proposed model relies on a limited amount of manually labeled data to classify a large corpus of unlabeled data. Our evaluation indicates that the proposed model offers improved performance when compared to other conventional models. Our analysis of the collected data indicates that YouTube recommendation engine does not take into account clickbait. Thus, it is susceptible to recommending misleading videos to users.

[1]  Gianluca Stringhini,et al.  BOTMAGNIFIER: Locating Spambots on the Internet , 2011, USENIX Security Symposium.

[2]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[3]  Bing Liu,et al.  Review spam detection , 2007, WWW '07.

[4]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[5]  Sotirios Chatzis,et al.  Gaussian Process-Mixture Conditional Heteroscedasticity , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Prakhar Biyani,et al.  "8 Amazing Secrets for Getting More Clicks": Detecting Clickbaits in News Streams Using Article Informality , 2016, AAAI.

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Naeemul Hassan,et al.  Diving Deep into Clickbaits: Who Use Them to What Extents in Which Topics with What Effects? , 2017, ASONAM.

[10]  Matthias Hagen,et al.  Clickbait Detection , 2016, ECIR.

[11]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[12]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[13]  Daoud M. Daoud,et al.  Clickbait Detection , 2018, ICSIE '18.

[14]  Tanmoy Chakraborty,et al.  We Used Neural Networks to Detect Clickbaits: You Won't Believe What Happened Next! , 2016, ECIR.

[15]  Yimin Chen,et al.  Misleading Online Content: Recognizing Clickbait as "False News" , 2015, WMDD@ICMI.

[16]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[17]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[18]  Shaik. AshaBee,et al.  Towards Online Spam Filtering In Social Networks , 2017 .

[19]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[20]  Niloy Ganguly,et al.  Stop Clickbait: Detecting and preventing clickbaits in online news media , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[21]  Gianluca Stringhini,et al.  B@bel: Leveraging Email Delivery for Spam Mitigation , 2012, USENIX Security Symposium.

[22]  Amol Agrawal,et al.  Clickbait detection using deep learning , 2016, 2016 2nd International Conference on Next Generation Computing Technologies (NGCT).

[23]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[24]  Sotirios Chatzis,et al.  Signal Modeling and Classification Using a Robust Latent Space Model Based on $t$ Distributions , 2008, IEEE Transactions on Signal Processing.

[25]  Max Welling,et al.  Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures , 2010, AAAI.