Building a General Purpose Cross-Domain Sentiment Mining Model

Building a model using machine learning that can classify the sentiment of natural language text often requires an extensive set of labeled training data from the same domain as the target text. Gathering and labeling new datasets whenever a model is needed for a new domain is time-consuming and difficult, especially if a dataset with numeric ratings is not available. In this paper we consider the problem of building models that have a high sentiment classification accuracy without the aid of a labeled dataset from the target domain. We show that ensembles of existing domain models can be used to achieve a classification accuracy that approaches that of models trained on data from the target domain.

[1]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Yuji Matsumoto,et al.  Extracting Aspect-Evaluation and Aspect-Of Relations in Opinion Mining , 2007, EMNLP.

[4]  Samarth Swarup,et al.  Cross-Domain Knowledge Transfer Using Structured Representations , 2006, AAAI.

[5]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[6]  Regina Barzilay,et al.  Multiple Aspect Ranking Using the Good Grief Algorithm , 2007, NAACL.

[7]  Jenefer Robinson A Sentimental Education , 2005 .

[8]  ChengXiang Zhai,et al.  Exploiting Domain Structure for Named Entity Recognition , 2006, HLT-NAACL.

[9]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[10]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[11]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[12]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[13]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[14]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.