Customizing Sentiment Classifiers to New Domains: a Case Study

Sentiment classification is a very domain specific problem; classifiers trained in one domain do not perform well in others. Unfortunately, many domains are lacking in large amounts of labeled data for fully-supervised learning approaches. At the same time, sentiment classifiers need to be customizable to new domains in order to be useful in practice. We attempt to address these difficulties and constraints in this paper, where we survey four different approaches to customizing a sentiment classification system to a new target domain in the absence of large amounts of labeled data. We base our experiments on data from four different domains. After establishing that naive cross-domain classification results in poor classification accuracy, we compare results obtained by using each of the four approaches and discuss their advantages, disadvantages and performance.

[1]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[2]  Saso Dzeroski,et al.  Combining Classifiers with Meta Decision Trees , 2003, Machine Learning.

[3]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[4]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[5]  Michael L. Littman,et al.  Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.

[6]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[7]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[8]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[9]  Edoardo M. Airoldi,et al.  Sentiment Extraction from Unstructured Text using Tabu Search-Enhanced Markov Blanket , 2004 .

[10]  Janyce Wiebe,et al.  Identifying Collocations for Recognizing Opinions , 2001 .

[11]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[12]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[13]  Shi Bing,et al.  Inductive learning algorithms and representations for text categorization , 2006 .

[14]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .