SUBJECTIVITY WORD SENSE DISAMBIGUATION: A METHOD FOR SENSE-AWARE SUBJECTIVITY ANALYSIS

Subjectivity lexicons have been invaluable resources in subjectivity analysis and their creation has been an important topic. Many systems rely on these lexicons. For any subjectivity analysis system, which relies on a subjectivity lexicon, subjectivity sense ambiguity is a serious problem. Such systems will be misled by the presence of subjectivity clues used with objective senses called false hits. We believe that any type of subjectivity analysis system relying on lexicons will benefit from a sense-aware approach. We think sense-aware subjectivity analysis has been neglected mostly because of the concerns related to word sense disambiguation (WSD), the problem of automatically determining which sense of a word is activated by the use of the word in a particular context according to a sense-inventory. Although WSD is the perfect tool for sense-aware classification, trust in traditional fine-grained WSD as an enabling technology is not high due to previous mostly unsuccessful results. In this thesis, we investigate feasible and practical methods to avoid these false hits via sense-aware analysis. We define a new coarse-grained WSD task capturing the right semantic granularity specific to subjectivity analysis.

[1]  Claire Cardie,et al.  Multi-Perspective Question Answering Using the OpQA Corpus , 2005, HLT.

[2]  Reinhard Rapp A Freely Available Automatically Generated Thesaurus of Related Words , 2004, LREC.

[3]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[4]  Rada Mihalcea,et al.  Integrating Knowledge for Subjectivity Sense Labeling , 2009, NAACL.

[5]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[7]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[8]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[9]  Sebastian Rudolph,et al.  Compositional Matrix-Space Models of Language , 2010, ACL.

[10]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[11]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[12]  Sabine Bergler,et al.  Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses , 2006, EACL.

[13]  Zoubin Ghahramani,et al.  A new approach to data driven clustering , 2006, ICML.

[14]  Katja Markert,et al.  Word Sense Subjectivity for Cross-lingual Lexical Substitution , 2010, HLT-NAACL.

[15]  Mona T. Diab Relieving the data Acquisition Bottleneck in Word Sense Disambiguation , 2004, ACL.

[16]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[17]  Andrés Montoyo,et al.  Opinion Polarity Detection - Using Word Sense Disambiguation to Determine the Polarity of Opinions , 2010, ICAART.

[18]  Janyce Wiebe,et al.  Tracking Point of View in Narrative , 1994, Comput. Linguistics.

[19]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[20]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[21]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[22]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[23]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[24]  Hwee Tou Ng,et al.  Scaling Up Word Sense Disambiguation via Parallel Texts , 2005, AAAI.

[25]  Rada Mihalcea,et al.  Word Sense and Subjectivity , 2006, ACL.

[26]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[27]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[28]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[29]  Rada Mihalcea,et al.  Amazon Mechanical Turk for Subjectivity Word Sense Disambiguation , 2010, Mturk@HLT-NAACL.

[30]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[31]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[32]  Matteo Negri,et al.  Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora , 2011, EMNLP.

[33]  Daniel Jurafsky,et al.  Learning to Merge Word Senses , 2007, EMNLP.

[34]  Theresa Wilson Fine-grained subjectivity and sentiment analysis: recognizing the intensity, polarity, and attitudes of private states , 2008 .

[35]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[36]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[37]  Katrin Erk,et al.  Investigations on Word Senses and Word Usages , 2009, ACL.

[38]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[39]  Michael Kaisser,et al.  Creating a Research Collection of Question Answer Sentence Pairs with Amazon's Mechanical Turk , 2008, LREC.

[40]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[41]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[42]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[43]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[44]  Sadaoki Furui,et al.  Collecting a Why-Question Corpus for Development and Evaluation of an Automatic QA-System , 2008, ACL.

[45]  Katja Markert,et al.  Subjectivity Recognition on Word Senses via Semi-supervised Mincuts , 2009, NAACL.

[46]  Roberto Navigli,et al.  Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance , 2006, ACL.

[47]  Noah A. Smith,et al.  Shedding (a Thousand Points of) Light on Biased Language , 2010, Mturk@HLT-NAACL.

[48]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[49]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[50]  Eneko Agirre,et al.  Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation , 1997, ACL.

[51]  Rada Mihalcea,et al.  Instance Based Learning with Automatic Feature Selection Applied to Word Sense Disambiguation , 2002, COLING.

[52]  Andrea Esuli,et al.  Determining Term Subjectivity and Term Orientation for Opinion Mining , 2006, EACL.

[53]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[54]  Rada Mihalcea,et al.  Utilizing Semantic Composition in Distributional Semantic Models for Word Sense Discrimination and Word Sense Disambiguation , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[55]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[56]  Mark A. Przybocki,et al.  Document Image Collection Using Amazon's Mechanical Turk , 2010, Mturk@HLT-NAACL.

[57]  Rada Mihalcea,et al.  EZ.WordNet: Principles for Automatic Generation of a Coarse Grained WordNet , 2001, FLAIRS Conference.

[58]  Rada Mihalcea,et al.  An Automatic Method for Generating Sense Tagged Corpora , 1999, AAAI/IAAI.

[59]  Stefan Thater,et al.  Ranking Paraphrases in Context , 2009, TextInfer@ACL.

[60]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[61]  Ian Davidson,et al.  Measuring Constraint-Set Utility for Partitional Clustering Algorithms , 2006, PKDD.

[62]  Casey Whitelaw Using Appraisal Taxonomies for Sentiment Analysis , 2005 .

[63]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[64]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[65]  Hinrich Schütze,et al.  Performance thresholding in practical text classification , 2006, CIKM '06.

[66]  Hua Xu,et al.  Identifying Evaluative Sentences in Online Discussions , 2011, AAAI.

[67]  Claire Cardie,et al.  Compositional Matrix-Space Models for Sentiment Analysis , 2011, EMNLP.

[68]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[69]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[70]  Vikas Sindhwani,et al.  Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria , 2009, HLT-NAACL 2009.

[71]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[72]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[73]  Rada Mihalcea,et al.  Improving the Impact of Subjectivity Word Sense Disambiguation on Contextual Opinion Analysis , 2011, CoNLL.

[74]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[75]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[76]  Rada Mihalcea,et al.  Bootstrapping Large Sense Tagged Corpora , 2002, LREC.

[77]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[78]  Adam Kilgarriff,et al.  Introduction to the special issue on evaluating word sense disambiguation systems , 2002, Natural Language Engineering.