Spin: lexical semantics, transitivity, and the identification of implicit sentiment

Current interest in automatic sentiment analysis is motivated by a variety of information requirements. The vast majority of work in sentiment analysis has been specifically targeted at detecting subjective statements and mining opinions. This dissertation focuses on a different but related problem that to date has received relatively little attention in NLP research: detecting implicit sentiment, or spin, in text. This text classification task is distinguished from other sentiment analysis work in that there is no assumption that the documents to be classified with respect to sentiment are necessarily overt expressions of opinion. They rather are documents that might reveal a perspective. This dissertation describes a novel approach to the identification of implicit sentiment, motivated by ideas drawn from the literature on lexical semantics and argument structure, supported and refined through psycholinguistic experimentation. A relationship predictive of sentiment is established for components of meaning that are thought to be drivers of verbal argument selection and linking and to be arbiters of what is foregrounded or backgrounded in discourse. In computational experiments employing targeted lexical selection for verbs and nouns, a set of features reflective of these components of meaning is extracted for the terms. As observable proxies for the underlying semantic components, these features are exploited using machine learning methods for text classification with respect to perspective. After initial experimentation with manually selected lexical resources, the method is generalized to require no manual selection or hand tuning of any kind. The robustness of this linguistically motivated method is demonstrated by successfully applying it to three distinct text domains under a number of different experimental conditions, obtaining the best classification accuracies yet reported for several sentiment classification tasks. A novel graph-based classifier combination method is introduced which further improves classification accuracy by integrating statistical classifiers with models of inter-document relationships.

[1]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[2]  Tim Groseclose,et al.  A Measure of Media Bias , 2005 .

[3]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[4]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[5]  Daryl J. Bem,et al.  Beliefs Attitudes and Human Affairs , 1970 .

[6]  Jimmy J. Lin,et al.  TREC 2006 at Maryland: Blog, Enterprise, Legal and QA Tracks , 2006, TREC.

[7]  Diego Reforgiato Recupero,et al.  Sentiment Analysis: Adjectives and Adverbs are Better than Adjectives Alone , 2007, ICWSM.

[8]  Marko Grobelnik,et al.  Feature Selection Using Linear Support Vector Machines , 2002 .

[9]  Janyce Wiebe,et al.  RECOGNIZING STRONG AND WEAK OPINION CLAUSES , 2006, Comput. Intell..

[10]  G. Lakoff,et al.  Women, Fire, and Dangerous Things: What Categories Reveal about the Mind , 1988 .

[11]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[12]  Gregory Grefenstette,et al.  Coupling Niche Browsers and Affect Analysis for an Opinion Mining Application , 2004, RIAO.

[13]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[14]  Kevin Grieves,et al.  VILLAINS, VICTIMS AND THE VIRTUOUS IN BILL O'REILLY'S “NO-SPIN ZONE” , 2007 .

[15]  Roger Ratcliff,et al.  Meaning through syntax: language comprehension and the reduced relative clause construction. , 2003, Psychological review.

[16]  Xiaojin Zhu,et al.  Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization , 2006 .

[17]  Gerhard Weikum,et al.  Stylistic Analysis Of Text For Information Access , 2005 .

[18]  Marti A. Hearst Direction-based text interpretation as an information access refinement , 1992 .

[19]  Min-Yen Kan,et al.  Role of Verbs in Document Analysis , 1998, ACL.

[20]  Matthew Hurst,et al.  Towards a Robust Metric of Polarity , 2006, Computing Attitude and Affect in Text.

[21]  Hana Filip,et al.  Reduced relatives judged hard require constraint-based analyses , 2002 .

[22]  Andrew V. Goldberg,et al.  Experimental study of minimum cut algorithms , 1997, SODA '97.

[23]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[24]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[25]  Janyce Wiebe,et al.  Computing Attitude and Affect in Text: Theory and Applications , 2005, The Information Retrieval Series.

[26]  Anna Wierzbicka,et al.  Lingua Mentalis: The Semantics of Natural Language , 1980 .

[27]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[28]  G. Leech,et al.  Word Frequencies in Written and Spoken English: based on the British National Corpus , 2001 .

[29]  G. McKoon,et al.  Externally and Internally Caused Change of State Verbs. , 2000 .

[30]  Mark H. Burstein The Use of Object-Specific Knowledge in Natural Language Processing , 1979, ACL.

[31]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[32]  Kamal Nigam,et al.  Towards a Robust Metric of Opinion , 2004 .

[33]  Razvan C. Bunescu,et al.  Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[34]  P. J. Stone Thematic text analysis: new agendas for analyzing text content , 1997 .

[35]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[36]  Stefan Th. Gries,et al.  Collostructions: Investigating the interaction of words and constructions , 2003 .

[37]  S. Thompson,et al.  Transitivity in Grammar and Discourse , 1980 .

[38]  David R. Dowty Thematic proto-roles and argument selection , 1991 .

[39]  Anton Nijholt,et al.  A Lexical Grammatical Implementation of Affect , 2004, TSD.

[40]  M. Sherif,et al.  The psychology of attitudes. , 1946, Psychological review.

[41]  I. Ajzen,et al.  Belief, Attitude, Intention, and Behavior: An Introduction to Theory and Research , 1977 .

[42]  Mary Dalrymple,et al.  The PARC 700 Dependency Bank , 2003, LINC@EACL.

[43]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[44]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[45]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[46]  Philip Resnik,et al.  Implicit Object Constructions and the (In)transitivity Continuum , 1997 .

[47]  Maarten Lemmens,et al.  Lexical Perspectives on Transitivity and Ergativity: Causative constructions in English , 1998 .

[48]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[49]  Edward Kako,et al.  The semantics of syntactic frames , 2006 .

[50]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[51]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[52]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[53]  Cynthia Fisher,et al.  On the semantic content of subcategorization frames , 1991, Cognitive Psychology.

[54]  Philip Resnik,et al.  The Linguist's Search Engine: An Overview , 2005, ACL.

[55]  Douglas W. Oard,et al.  An Exploratory Study of the W3C Mailing List Test Collection for Retrieval of Emails with Pro/Con Argument , 2006, CEAS.

[56]  Pero Subasic,et al.  Affect analysis of text using fuzzy semantic typing , 2001, IEEE Trans. Fuzzy Syst..

[57]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[58]  Jonathon Read,et al.  Recognising Affect in Text using Pointwise-Mutual Information , 2004 .

[59]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[60]  益子 真由美 Argument Structure , 1993, The Lexicon.

[61]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[62]  Dragomir R. Radev,et al.  An Automated Method of Topic-Coding Legislative Speech Over Time with Application to the 105th-108th U.S. Senate , 2006 .

[63]  Ted Briscoe,et al.  Corpus Annotation for Parser Evaluation , 1999, ArXiv.

[64]  Frank Keller,et al.  Phonology competes with syntax: experimental evidence for the interaction of word order and accent placement in the realization of Information Structure , 2001, Cognition.

[65]  Jesse M. Shapiro,et al.  Media Bias and Reputation , 2005, Journal of Political Economy.

[66]  Diego Reforgiato Recupero,et al.  The OASYS 2.0 Opinion Analysis System , 2007, ICWSM.

[67]  G. Lakoff Women, fire, and dangerous things : what categories reveal about the mind , 1989 .

[68]  Wei-Hao Lin,et al.  Are These Documents Written from Different Perspectives? A Test of Different Perspectives Based on Statistical Distribution Divergence , 2006, ACL.

[69]  Janyce Wiebe,et al.  Just How Mad Are You? Finding Strong and Weak Opinion Clauses , 2004, AAAI.

[70]  Dan Jurafsky,et al.  Automatic Extraction of Opinion Propositions and their Holders , 2004 .

[71]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[72]  Wei-Hao Lin,et al.  Which Side are You on? Identifying Perspectives at the Document and Sentence Levels , 2006, CoNLL.

[73]  Marco Baroni,et al.  Identifying subjective adjectives through web-based mutual information , 2004 .

[74]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[75]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[76]  Maarten Lemmens Lexical perspectives on transitivity and ergativity , 1998 .

[77]  Roger Ratcliff,et al.  Interactions of Meaning and Syntax: Implications for Models of Sentence Comprehension. , 2007 .

[78]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[79]  Gail McKoon,et al.  Event templates in the lexical representations of verbs , 2002, Cognitive Psychology.

[81]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[82]  Magda B. Arnold,et al.  The nature of emotion , 1968 .

[83]  Rohini K. Srihari,et al.  Using Verbs and Adjectives to Automatically Classify Blog Sentiment , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[84]  Andrew V. Goldberg,et al.  On Implementing Push-Relabel Method for the Maximum Flow Problem , 1995, IPCO.

[85]  E. Kako Thematic role properties of subjects and objects , 2006, Cognition.

[86]  Takamura Hiroya,et al.  Extracting Emotional Polarity of Words using Spin Model , 2004 .

[87]  Takashi Inui,et al.  Extracting Emotional Polarity of Words using Spin Model , 2004, ACL 2005.

[88]  Michael Gamon,et al.  Automatic Identification of Sentiment Vocabulary: Exploiting Low Association with Known Sentiment Terms , 2005, ACL 2005.

[89]  Fred J. Damerau,et al.  Generating and Evaluating Domain-Oriented Multi-Word Terms from Texts , 1993, Inf. Process. Manag..

[90]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[91]  Andrew V. Goldberg,et al.  On Implementing the Push—Relabel Method for the Maximum Flow Problem , 1997, Algorithmica.

[92]  Michael Gamon,et al.  Linguistic correlates of style: authorship classification with deep linguistic analysis features , 2004, COLING.

[93]  Diego Reforgiato Recupero,et al.  OASYS: An Opinion Analysis System , 2006, AAAI 2006.

[94]  B. Levin Unaccusativity: At the Syntax-Lexical Semantics Interface , 1994 .

[95]  Stephen D. Durbin,et al.  A system for affective rating of texts , 2003 .