The airline industry is a very competitive market which has grown rapidly in the past 2 decades. Airline companies resort to traditional customer feedback forms which in turn are very tedious and time consuming. This is where Twitter data serves as a good source to gather customer feedback tweets and perform a sentiment analysis. In this paper, we worked on a dataset comprising of tweets for 6 major US Airlines and performed a multi-class sentiment analysis. This approach starts off with pre-processing techniques used to clean the tweets and then representing these tweets as vectors using a deep learning concept (Doc2vec) to do a phrase-level analysis. The analysis was carried out using 7 different classification strategies: Decision Tree, Random Forest, SVM, K-Nearest Neighbors, Logistic Regression, Gaussian Naïve Bayes and AdaBoost. The classifiers were trained using 80% of the data and tested using the remaining 20% data. The outcome of the test set is the tweet sentiment (positive/negative/neutral). Based on the results obtained, the accuracies were calculated to draw a comparison between each classification approach and the overall sentiment count was visualized combining all six airlines.
[1]
Quoc V. Le,et al.
Distributed Representations of Sentences and Documents
,
2014,
ICML.
[2]
Lillian Lee,et al.
Opinion Mining and Sentiment Analysis
,
2008,
Found. Trends Inf. Retr..
[3]
Aaas News,et al.
Book Reviews
,
1893,
Buffalo Medical and Surgical Journal.
[4]
Vaibhavi N Patodkar,et al.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
,
2016
.
[5]
Haixun Wang,et al.
Guest Editorial: Big Social Data Analysis
,
2014,
Knowl. Based Syst..
[6]
Prem Melville,et al.
Sentiment analysis of blogs by combining lexical knowledge with text classification
,
2009,
KDD.
[7]
Rui Xia,et al.
Ensemble of feature sets and classification algorithms for sentiment classification
,
2011,
Inf. Sci..