A review of natural language processing techniques for opinion mining systems

Abstract As the prevalence of social media on the Internet, opinion mining has become an essential approach to analyzing so many data. Various applications appear in a wide range of industrial domains. Meanwhile, opinions have diverse expressions which bring along research challenges. Both of the practical demands and research challenges make opinion mining an active research area in recent years. In this paper, we present a review of Natural Language Processing (NLP) techniques for opinion mining. First, we introduce general NLP techniques which are required for text preprocessing. Second, we investigate the approaches of opinion mining for different levels and situations. Then we introduce comparative opinion mining and deep learning approaches for opinion mining. Opinion summarization and advanced topics are introduced later. Finally, we discuss some challenges and open problems related to opinion mining.

[1]  Parminder Bhatia,et al.  Better Document-level Sentiment Analysis from RST Discourse Parsing , 2015, EMNLP.

[2]  Xiaohui Yu,et al.  Modeling and Predicting the Helpfulness of Online Reviews , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[3]  Claire Cardie,et al.  Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns , 2005, HLT.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Atsushi Fujii,et al.  Extracting Condition-Opinion Relations Toward Fine-grained Opinion Mining , 2015, EMNLP.

[6]  Björn W. Schuller,et al.  Knowledge-Based Approaches to Concept-Level Sentiment Analysis , 2013, IEEE Intell. Syst..

[7]  Jong-Seok Lee,et al.  Data-driven integration of multiple sentiment dictionaries for lexicon-based sentiment classification of product reviews , 2014, Knowl. Based Syst..

[8]  Bing Liu,et al.  Opinion Extraction and Summarization on the Web , 2006, AAAI.

[9]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[10]  Xiaoyan Zhu,et al.  Sentiment Analysis with Global Topics and Local Dependency , 2010, AAAI.

[11]  Bing Liu,et al.  Mining Comparative Sentences and Relations , 2006, AAAI.

[12]  Vincent Ng,et al.  Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews , 2006, ACL.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Zhu Zhang,et al.  Utility scoring of product reviews , 2006, CIKM '06.

[15]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[16]  Jacob Eisenstein,et al.  Discourse Connectors for Latent Subjectivity in Sentiment Analysis , 2013, NAACL.

[17]  Kevin Duh,et al.  Is Machine Translation Ripe for Cross-Lingual Sentiment Classification? , 2011, ACL.

[18]  Rada Mihalcea,et al.  Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge , 2009, EMNLP.

[19]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[20]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[21]  Allan Hanbury,et al.  Detecting Risks in the Banking System by Sentiment Analysis , 2015, EMNLP.

[22]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[23]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[24]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[25]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[26]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[27]  Erhard W. Hinrichs,et al.  Accurate Linear-Time Chinese Word Segmentation via Embedding Matching , 2015, ACL.

[28]  Arjun Mukherjee,et al.  Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns , 2015, ICWSM.

[29]  Claire Cardie,et al.  Opinion Mining with Deep Recurrent Neural Networks , 2014, EMNLP.

[30]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[31]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[32]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[33]  Timothy W. Finin,et al.  Delta TFIDF: An Improved Feature Space for Sentiment Analysis , 2009, ICWSM.

[34]  Björn W. Schuller,et al.  Statistical Approaches to Concept-Level Sentiment Analysis , 2013, IEEE Intell. Syst..

[35]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[36]  Shafiq R. Joty,et al.  Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings , 2015, EMNLP.

[37]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[38]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[39]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[40]  Arno Scharl,et al.  Rule-based opinion target and aspect extraction to acquire affective knowledge , 2013, WWW '13 Companion.

[41]  Xuanjing Huang,et al.  Gated Recursive Neural Network for Chinese Word Segmentation , 2015, ACL.

[42]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[43]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[44]  Vincent Ng,et al.  Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification , 2009, ACL.

[45]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[46]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[47]  Bing Liu,et al.  Identifying comparative sentences in text documents , 2006, SIGIR.

[48]  Dragomir R. Radev,et al.  Identifying Text Polarity Using Random Walks , 2010, ACL.

[49]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[50]  Sasha Blair-Goldensohn,et al.  Sentiment Summarization: Evaluating and Learning User Preferences , 2009, EACL.

[51]  Gerard de Melo,et al.  Sentiment-Aspect Extraction based on Restricted Boltzmann Machines , 2015, ACL.

[52]  Ivan Titov,et al.  A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations , 2013, ACL.

[53]  Bing Liu,et al.  Mining Opinions in Comparative Sentences , 2008, COLING.

[54]  Claire Cardie,et al.  Identifying Expressions of Opinion in Context , 2007, IJCAI.

[55]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[56]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[57]  Panagiotis G. Ipeirotis,et al.  Show me the money!: deriving the pricing power of product features by mining consumer reviews , 2007, KDD '07.

[58]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[59]  Panagiotis G. Ipeirotis,et al.  Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics , 2010, IEEE Transactions on Knowledge and Data Engineering.

[60]  Ting Liu,et al.  Learning Semantic Representations of Users and Products for Document Level Sentiment Classification , 2015, ACL.

[61]  Songbo Tan,et al.  A novel scheme for domain-transfer problem in the context of sentiment analysis , 2007, CIKM '07.

[62]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[63]  Nathanael Chambers,et al.  Identifying Political Sentiment between Nation States with Social Media , 2015, EMNLP.

[64]  Wanxiang Che,et al.  LTP: A Chinese Language Technology Platform , 2010, COLING.

[65]  Fakhri Karray,et al.  Multisensor data fusion: A review of the state-of-the-art , 2013, Inf. Fusion.

[66]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[67]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[68]  Stephen Shaoyi Liao,et al.  Mining comparative opinions from customer reviews for Competitive Intelligence , 2011, Decis. Support Syst..

[69]  Xiaolong Wang,et al.  Cross-lingual Opinion Analysis via Negative Transfer Detection , 2014, ACL.

[70]  Guodong Zhou,et al.  Semi-Supervised Learning for Imbalanced Sentiment Classification , 2011, IJCAI.

[71]  Suk Hwan Lim,et al.  Extracting and Ranking Product Features in Opinion Documents , 2010, COLING.

[72]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[73]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[74]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[76]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[77]  Yue Lu Exploiting Social Context for Review Quality Prediction , 2010 .

[78]  Noémie Elhadad,et al.  An Unsupervised Aspect-Sentiment Model for Online Reviews , 2010, NAACL.

[79]  Shiliang Sun,et al.  A survey of multi-source domain adaptation , 2015, Inf. Fusion.

[80]  Ming Zhou,et al.  A Joint Segmentation and Classification Framework for Sentence Level Sentiment Classification , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[81]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[82]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[83]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[84]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[85]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[86]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[87]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[88]  Yang Liu,et al.  Joint Chinese Word Segmentation, POS Tagging and Parsing , 2012, EMNLP-CoNLL.

[89]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[90]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[91]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[92]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[93]  Xiaoqing Zheng,et al.  Deep Learning for Chinese Word Segmentation and POS Tagging , 2013, EMNLP.

[94]  Carolyn Penstein Rosé,et al.  Generalizing Dependency Features for Opinion Mining , 2009, ACL.

[95]  Ming Zhou,et al.  Cross-lingual Sentiment Lexicon Learning With Bilingual Word Graph Label Propagation , 2015, CL.

[96]  Lei Zhang,et al.  Entity discovery and assignment for opinion mining applications , 2009, KDD.

[97]  Yulan He,et al.  Joint sentiment/topic model for sentiment analysis , 2009, CIKM.

[98]  Jorge A. Balazs,et al.  Opinion Mining and Information Fusion: A survey , 2016, Inf. Fusion.

[99]  Christopher S. G. Khoo,et al.  Aspect-based sentiment analysis of movie reviews on discussion boards , 2010, J. Inf. Sci..

[100]  Ming Zhou,et al.  Building Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach , 2014, COLING.

[101]  Shiliang Sun,et al.  A review of optimization methodologies in support vector machines , 2011, Neurocomputing.

[102]  Patrik Lambert Aspect-Level Cross-lingual Sentiment Classification with Constrained SMT , 2015, ACL.

[103]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[104]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[105]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[106]  Lei Huang,et al.  Semi-Stacking for Semi-supervised Sentiment Classification , 2015, ACL.

[107]  Harith Alani,et al.  Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification , 2011, ACL.

[108]  Lise Getoor,et al.  Supervised and Unsupervised Methods in Employing Discourse Relations for Improving Opinion Polarity Classification , 2009, EMNLP.

[109]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[110]  Hsin-Hsi Chen,et al.  Mining opinions from the Web: Beyond relevance retrieval , 2007 .

[111]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[112]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[113]  Stephen Clark,et al.  Joint Word Segmentation and POS Tagging Using a Single Perceptron , 2008, ACL.

[114]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[115]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[116]  Yu Lei,et al.  Learning to Adapt Credible Knowledge in Cross-lingual Sentiment Analysis , 2015, ACL.

[117]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[118]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[119]  Giuseppe Carenini,et al.  Abstractive Summarization of Product Reviews Using Discourse Structure , 2014, EMNLP.

[120]  Claire Cardie,et al.  Adapting a Polarity Lexicon using Integer Linear Programming for Domain-Specific Sentiment Classification , 2009, EMNLP.

[121]  Jun Zhao,et al.  Cross-domain sentiment classification using a two-stage method , 2009, CIKM.

[122]  Xiaojun Wan,et al.  Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis , 2008, EMNLP.

[123]  Jingbo Zhu,et al.  NiuParser: A Chinese Syntactic and Semantic Parsing Toolkit , 2015, ACL.

[124]  Dipak Panigrahy Biographies , 2018, Cancer and Metastasis Reviews.

[125]  Jackie Chi Kit Cheung,et al.  Multi-Document Summarization of Evaluative Text , 2013, EACL.

[126]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[127]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[128]  Houfeng Wang,et al.  Cross-Lingual Mixture Model for Sentiment Classification , 2012, ACL.

[129]  Guodong Zhou,et al.  Active Learning for Cross-domain Sentiment Classification , 2013, IJCAI.

[130]  Danushka Bollegala,et al.  Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification , 2011, ACL.

[131]  Hua Xu,et al.  Clustering product features for opinion mining , 2011, WSDM '11.

[132]  Christopher D. Manning,et al.  Global Belief Recursive Neural Networks , 2014, NIPS.

[133]  Delip Rao,et al.  Semi-Supervised Polarity Lexicon Induction , 2009, EACL.

[134]  Steven Skiena,et al.  Trading Strategies to Exploit Blog and News Sentiment , 2010, ICWSM.

[135]  Qian Liu,et al.  Automated Rule Selection for Aspect Extraction in Opinion Mining , 2015, IJCAI.

[136]  Abhinav Kumar,et al.  Spotting opinion spammers using behavioral footprints , 2013, KDD.

[137]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[138]  Xuanjing Huang,et al.  FudanNLP: A Toolkit for Chinese Natural Language Processing , 2013, ACL.

[139]  Martin Ester,et al.  ETF: extended tensor factorization model for personalizing prediction of review helpfulness , 2012, WSDM '12.

[140]  Guodong Zhou,et al.  Negation and Speculation Identification in Chinese Language , 2015, ACL.

[141]  Oscar Täckström,et al.  Semi-supervised latent variable models for sentence-level sentiment analysis , 2011, ACL.

[142]  Panagiotis G. Ipeirotis,et al.  Designing novel review ranking systems: predicting the usefulness and impact of reviews , 2007, ICEC.

[143]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[144]  Takaaki Hasegawa,et al.  Opinion Summarization with Integer Linear Programming Formulation for Sentence Extraction and Ordering , 2010, COLING.

[145]  Claire Cardie,et al.  Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization , 2014, ACL.

[146]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[147]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.

[148]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[149]  Christopher Joseph Pal,et al.  Cross Lingual Adaptation: An Experiment on Sentiment Classifications , 2010, ACL.

[150]  Claire Cardie,et al.  Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis , 2008, EMNLP.

[151]  Xiaohui Yu,et al.  ARSA: a sentiment-aware model for predicting sales performance using blogs , 2007, SIGIR.

[152]  Keh-Jiann Chen,et al.  E-HowNet and Automatic Construction of a Lexical Ontology , 2010, COLING.

[153]  Johanna D. Moore,et al.  Generating and evaluating evaluative arguments , 2006, Artif. Intell..

[154]  Amit P. Sheth,et al.  From Data to Actionable Knowledge: Big Data Challenges in the Web of Things , 2013, IEEE Intell. Syst..

[155]  Sasha Blair-Goldensohn,et al.  The viability of web-derived polarity lexicons , 2010, NAACL.

[156]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[157]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[158]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.