Stacking-Based Ensemble Learning on Low Dimensional Features for Fake News Detection

With the age of incoming of self-media, everyone can be the author of the content in the media age of big data. This has caused a mass of fake news appearing in the network. Authors of these fake news will mislead the public by spreading and it will bring economic and social benefits. Existing work focuses on using the various types of features of the article in the hope that a way to accurately identify fake news can be found, but this undermines their universality. In this paper, we propose a pipeline that combines preprocessing, feature extraction and model fusion for a more accurate and automated prediction. Specially we fusion of latent semantic analysis (LSA) and ensemble learning model results using stacking. Experimental analysis of real-world data demonstrates that our pipeline achieves higher accuracy than existing approaches.

[1]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[2]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[3]  Huan Liu,et al.  Exploiting Tri-Relationship for Fake News Detection , 2017, ArXiv.

[4]  Iryna Gurevych,et al.  A Retrospective Analysis of the Fake News Challenge Stance-Detection Task , 2018, COLING.

[5]  Isabelle Augenstein,et al.  A simple but tough-to-beat baseline for the Fake News Challenge stance detection task , 2017, ArXiv.

[6]  Georg Rehm,et al.  From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles , 2017, NLPmJ@EMNLP.

[7]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[8]  Miriam J. Metzger,et al.  The science of fake news , 2018, Science.

[9]  Bo Yang,et al.  Stream-based live public opinion monitoring approach with adaptive probabilistic topic model , 2018, Soft Comput..

[10]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[11]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[12]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[13]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[14]  Huan Liu,et al.  FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media , 2018, ArXiv.

[15]  Daniel F. Stone,et al.  Media Bias in the Marketplace: Theory , 2014 .

[16]  Andreas Vlachos,et al.  Fake news stance detection using stacked ensemble of classifiers , 2017, NLPmJ@EMNLP.

[17]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.