A hybrid model for opinion mining based on domain sentiment dictionary

Sentiment classification is an application of sentiment analysis, which is a popular research field in NLP. It can classify documents into different categories according to their sentiments. For a sentiment classification task, the first step is to extract sentimental features from documents, and then classify them using some classifiers. In the first step, a traditional way to extract sentimental features is to apply sentiment dictionaries. However, sentiment words may have different sentiment tendencies in different contexts, and traditional sentiment dictionaries does not consider this situation where wrong sentiment tendencies may be selected for sentiment words. In our research, we find that sentiment words will not have diverse meanings when they associate with the nearby aspects and entities in documents. Then, we propose a three layers sentiment dictionary, which can associate sentiment words with the corresponding entities and aspects together to reduce their multiple meanings. In the second step of the sentiment classification task, many classification models, such as SVM, GBDT, can be used to classify documents according to the extracted sentiment words. However, different classifiers have different weaknesses. A Stacking-based hybrid model is applied to combine SVM and GBDT together to overcome their weaknesses and reach higher performance. This hybrid model contains two layers, and the output of the first layer will become the input of the second layer. The first layer will generate different classification results according to different classifiers, while the second layer will automatically learn how to select a probable one as the final result. The experimental results show that our hybrid model outperforms the baseline single models.

[1]  Xingming Sun,et al.  Toward Efficient Multi-Keyword Fuzzy Search Over Encrypted Outsourced Data With Accuracy Improvement , 2016, IEEE Transactions on Information Forensics and Security.

[2]  Wenyin Liu,et al.  Term Weighting Schemes for Question Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xingming Sun,et al.  Enabling Personalized Search over Encrypted Outsourced Data with Efficiency Improvement , 2016, IEEE Transactions on Parallel and Distributed Systems.

[4]  Qiang Dong,et al.  Hownet and the Computation of Meaning: (With CD-ROM) , 2006 .

[5]  Kai Yang,et al.  An effective hybrid model for opinion mining and sentiment analysis , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[6]  J. Friedman Stochastic gradient boosting , 2002 .

[7]  Jian Su,et al.  Supervised and Traditional Term Weighting Methods for Automatic Text Categorization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[9]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[10]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[11]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[12]  Jian Ma,et al.  A comparative assessment of ensemble learning for credit scoring , 2011, Expert Syst. Appl..

[13]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[14]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[15]  Qiang Dong,et al.  Hownet And The Computation Of Meaning , 2006 .

[16]  Youngjoong Ko,et al.  A study of term weighting schemes using class information for text classification , 2012, SIGIR '12.

[17]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[18]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[19]  Xingming Sun,et al.  Enabling Semantic Search Based on Conceptual Graphs over Encrypted Outsourced Data , 2019, IEEE Transactions on Services Computing.

[20]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[22]  Tao Wang,et al.  Entropy-Based Term Weighting Schemes for Text Categorization in VSM , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[23]  Zhihua Xia,et al.  A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data , 2016, IEEE Transactions on Parallel and Distributed Systems.

[24]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[25]  Jörg Kindermann,et al.  Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? , 2002, Machine Learning.

[26]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[27]  Chen Fu,et al.  A Study on Sentiment Computing and Classification of Sina Weibo with Word2vec , 2014, 2014 IEEE International Congress on Big Data.