Filling the Gap: Semi-Supervised Learning for Opinion Detection Across Domains

We investigate the use of Semi-Supervised Learning (SSL) in opinion detection both in sparse data situations and for domain adaptation. We show that co-training reaches the best results in an in-domain setting with small labeled data sets, with a maximum absolute gain of 33.5%. For domain transfer, we show that self-training gains an absolute improvement in labeling accuracy for blog data of 16% over the supervised approach with target domain training data.

[1]  Janyce Wiebe,et al.  A Corpus Study of Evaluative and Speculative Language , 2001, SIGDIAL Workshop.

[2]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[3]  Partha Pratim Talukdar,et al.  Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition , 2010, ACL.

[4]  Hongbo Xu,et al.  Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis , 2009, ECIR.

[5]  Miriam Eckert,et al.  The ICWSM 2010 JDPA Sentiment Corpus for the Automotive Domain , 2010 .

[6]  Wei Zhang,et al.  UIC at TREC 2007 Blog Track , 2007, TREC.

[7]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[8]  Zhi-Hua Zhou,et al.  Analyzing Co-training Style Algorithms , 2007, ECML.

[9]  Hui Zhang,et al.  WIDIT in TREC 2007 Blog Track: Combining Lexicon-Based Methods to Detect Opinionated Blogs , 2007, TREC.

[10]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[11]  Rohini K. Srihari,et al.  OpinionMiner: a novel machine learning system for web opinion mining and extraction , 2009, KDD.

[12]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[13]  Takashi Inui,et al.  Latent Variable Models for Semantic Orientations of Phrases , 2006, EACL.

[14]  Clement Yu,et al.  UIC at TREC 2008 Blog Track , 2008 .

[15]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[17]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[18]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[19]  Rohini K. Srihari,et al.  Using Verbs and Adjectives to Automatically Classify Blog Sentiment , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[20]  Sandra Kübler,et al.  Semi-supervised Learning for Opinion Detection , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[21]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[22]  ThrunSebastian,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000 .

[23]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[24]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[25]  Mirella Lapata,et al.  Semi-Supervised Semantic Role Labeling , 2009, EACL.

[26]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[27]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.