论文信息 - The Sarcasm Detection in News Headlines Based on Machine Learning Technology

The Sarcasm Detection in News Headlines Based on Machine Learning Technology

The work focuses on creating a program for determining sarcasm in news headlines using machine learning methods. This paper develops a program for determining sarcasm in the text in Python. The introduction describes the relevance, scientific novelty, and practical value of the developed program. The paper analyses research on the definition of sarcasm in the text and considers the main problems faced by previous scientists in predicting sarcastic statements. A dataset taken from the Kaggle site, where news headlines from two American websites are collected, is perfect for our task. Logistic regression and Bayesian classifier, simple tokenisation and vectors extraction based on TF-IDF, and neural networks creation such as LSTM and GRU were chosen. Glove and Word2Vec methods were selected for weight extraction. The data have been analysed. The program's design has been described, and its structure, the analysis of results of experimenters and work of the created models, have been carried out. The best accuracy was shown by the neural network model with scales created using the Glove method: 80.5%. The following model of the Bayesian classifier using TF-IDF vectors: 78.9%. Next is a model of a neural network with scales created using the Word2Vec method: 77.2%. Following neural network without weights: 76.6%. The lowest logical regression model in the table using TF-IDF vectors: 74%.