ConSent: Context-based sentiment analysis

Abstract We present ConSent, a novel context-based approach for the task of sentiment analysis. Our approach builds on techniques from the field of information retrieval to identify key terms indicative of the existence of sentiment. We model these terms and the contexts in which they appear and use them to generate features for supervised learning. The two major strengths of the proposed model are its robustness against noise and the easy addition of features from multiple sources to the feature set. Empirical evaluation over multiple real-world domains demonstrates the merit of our approach, compared to state-of the art methods both in noiseless and noisy text.

[1]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[2]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[3]  Stefan Evert,et al.  SentiKLUE: Updating a Polarity Classifier in 48 Hours , 2014, *SEMEVAL.

[4]  Hironori Takeuchi,et al.  Mining of Business-Oriented Conversations at a Call Center , 2008 .

[5]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[6]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[7]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[9]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[10]  Djoerd Hiemstra,et al.  Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term , 2002, SIGIR '02.

[11]  Jaime G. Carbonell,et al.  Document Representation and Query Expansion Models for Blog Recommendation , 2008, ICWSM.

[12]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[13]  Justin Zobel,et al.  Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..

[14]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[15]  Haoqi Zhang,et al.  An Iterative Dual Pathway Structure for Speech-to-Text Transcription , 2011, Human Computation.

[16]  R. Hallowell The relationships of customer satisfaction, customer loyalty, and profitability: an empirical study , 1996 .

[17]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[18]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[19]  Diana Maynard,et al.  Automatic Detection of Political Opinions in Tweets , 2011, #MSM.

[20]  Satoshi Morinaga,et al.  Mining product reputations on the Web , 2002, KDD.

[21]  John Carroll,et al.  Weakly supervised techniques for domain-independent sentiment classification , 2009, TSA@CIKM.

[22]  Ioannis Pitas,et al.  Automatic emotional speech classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[24]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[25]  Lior Rokach,et al.  Wikipedia-based query performance prediction , 2014, SIGIR.

[26]  Yoshua Bengio,et al.  The Curse of Dimensionality for Local Kernel Machines , 2005 .

[27]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval , 2008, NAACL.

[28]  Lior Rokach,et al.  Methodology for Connecting Nouns to Their Modifying Adjectives , 2014, CICLing.

[29]  Diego Reforgiato Recupero,et al.  Sentiment Analysis: Adjectives and Adverbs are Better than Adjectives Alone , 2007, ICWSM.

[30]  Thomas Oommen,et al.  Sampling Bias and Class Imbalance in Maximum-likelihood Logistic Regression , 2011 .

[31]  Youngja Park,et al.  Towards real-time measurement of customer satisfaction using automatically generated call transcripts , 2009, CIKM.

[32]  Maite Taboada,et al.  Methods for Creating Semantic Orientation Dictionaries , 2006, LREC.

[33]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[34]  Frederick F. Reichheld,et al.  Loyalty Rules: How Today's Leaders Build Lasting Relationships , 2001 .

[35]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[36]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[37]  Christopher M. Danforth,et al.  The Geography of Happiness: Connecting Twitter Sentiment and Expression, Demographics, and Objective Characteristics of Place , 2013, PloS one.

[38]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[39]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[40]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[41]  Reda Alhajj,et al.  Effectiveness of template detection on noise reduction and websites summarization , 2013, Inf. Sci..

[42]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[43]  Preslav Nakov,et al.  SemEval-2014 Task 9: Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[44]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[45]  Josef van Genabith,et al.  #hardtoparse: POS Tagging and Parsing the Twitterverse , 2011, Analyzing Microtext.

[46]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, ICTIR.

[47]  Yuval Elovici,et al.  CoBAn: A context based model for data leakage prevention , 2014, Inf. Sci..

[48]  Martin Potthast,et al.  Crowdsourcing a wikipedia vandalism corpus , 2010, SIGIR.

[49]  Djoerd Hiemstra,et al.  Query Performance Prediction: Evaluation Contrasted with Effectiveness , 2010, ECIR.

[50]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[51]  Takashi Inui,et al.  Extracting Semantic Orientations of Phrases from Dictionary , 2007, NAACL.

[52]  Michel Verleysen,et al.  The Curse of Dimensionality in Data Mining and Time Series Prediction , 2005, IWANN.

[53]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[54]  Eiichiro Sumita,et al.  Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation , 2007, ACL.

[55]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[56]  Rosalind W. Picard,et al.  A computational model for the automatic recognition of affect in speech , 2004 .

[57]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[58]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[59]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[60]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[61]  Monika Henzinger,et al.  Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[62]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[63]  David Carmel,et al.  Spoken document retrieval from call-center conversations , 2006, SIGIR.

[64]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[65]  Boleslaw K. Szymanski,et al.  Taming the Curse of Dimensionality in Kernels and Novelty Detection , 2004, WSC.

[66]  Bhuvana Ramabhadran,et al.  Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech , 2012, Speech Commun..

[67]  John D. Lafferty,et al.  A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , 2017, SIGF.

[68]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[69]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[70]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[71]  Marie-Francine Moens,et al.  A machine learning approach to sentiment analysis in multilingual Web texts , 2009, Information Retrieval.

[72]  Dipankar Das,et al.  Fuzzy Clustering for Semi-supervised Learning - Case Study: Construction of an Emotion Lexicon , 2012, MICAI.

[73]  Ting Liu,et al.  Collocation Polarity Disambiguation Using Web-based Pseudo Contexts , 2012, EMNLP.

[74]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[75]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[76]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[77]  Christian Homburg,et al.  Personal characteristics as moderators of the relationship between customer satisfaction and loyalty—an empirical analysis , 2001 .

[78]  Eni Mustafaraj,et al.  Can Collective Sentiment Expressed on Twitter Predict Political Elections? , 2011, AAAI.

[79]  Jean-Luc Gauvain,et al.  CallSurf: Automatic Transcription, Indexing and Structuration of Call Center Conversational Speech for Knowledge Extraction and Query by Content , 2008, LREC.

[80]  Ronald W. Langacker,et al.  Observations and speculations on subjectivity , 1985 .

[81]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[82]  John D. Lafferty,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2001, SIGIR Forum.

[83]  Letha H. Etzkorn,et al.  Predicting students' grades in computer science courses based on complexity measures of teacher's lecture notes , 2009 .

[84]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[85]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[86]  Claire Cardie,et al.  Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis , 2008, EMNLP.

[87]  Yue Lu,et al.  Latent aspect rating analysis on review text data: a rating regression approach , 2010, KDD.

[88]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[89]  Luis Alfonso Ureña López,et al.  SINAI: Voting System for Twitter Sentiment Analysis , 2014, *SEMEVAL.

[90]  Rami Puzis,et al.  Computationally efficient link prediction in a variety of social networks , 2013, ACM Trans. Intell. Syst. Technol..

[91]  Timothy W. Finin,et al.  Delta TFIDF: An Improved Feature Space for Sentiment Analysis , 2009, ICWSM.

[92]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[93]  Frédérik Cailliau,et al.  Mining Automatic Speech Transcripts for the Retrieval of Problematic Calls , 2013, CICLing.

[94]  Erik Cambria,et al.  SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis , 2014, AAAI.