A Hybrid Method for Fake News Detection using Cosine Similarity Scores
暂无分享,去创建一个
In this work, we propose a novel hybrid method for fake news detection. Two approaches have been used to assess the authenticity of the news using web-scrapped data. In the first approach the data is the pre-processed using NLP techniques like extraction of raw text, the removal of special-characters, white-spaces, and stop words. This is followed by lemmatization which groups words with similar meanings. After Lemmatization we apply, Term Frequency - Inverse Document Frequency (TF-IDF) Vectorization to form a corpus which is further used to train the models. We propose the use of cosine similarity score, obtained after performing topic modelling along with the corpus to improve the classification accuracies. The classifiers are KNN, Decision Tree, Naive Bayes, Logistic Regression, Passive-aggressive Classifier, and SVM to determine the news is reliable or unreliable. More focus has been given to improve the classification accuracies of the passive aggressive classifier which is the most widely used classifier in fake news detection. In the second approach, we use ensemble learning technique called as stacking along with cosine similarity score to train another model which gives the result as reliable or unreliable. It is observed that the second approach shows good improvement in the accuracy of fake news detection.