Dual word and document seed selection for semi-supervised sentiment classification

Semi-supervised sentiment classification aims to train a classifier with a small number of labeled data (called seed data) and a large amount of unlabeled data. a big advantage of this approach is its saving of annotation effort by using the unlabeled data which is usually freely available. In this paper, we propose an approach to further minimize the annotation effort of semi-supervised sentiment classification by actively selecting the seed data. Specifically, a novel selection strategy is proposed to simultaneously select good words and documents for manual annotation by considering both of their annotation costs and informativeness. Experimental results demonstrate the effectiveness of our approach.

[1]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[2]  Vikas Sindhwani,et al.  Document-Word Co-regularization for Semi-supervised Sentiment Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[3]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[4]  Tao Li,et al.  A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge , 2009, ACL.

[5]  Vincent Ng,et al.  Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification , 2009, ACL.

[6]  Chu-Ren Huang,et al.  Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification , 2010, ACL.

[7]  Michael L. Littman,et al.  Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.

[8]  Guodong Zhou,et al.  Semi-Supervised Learning for Imbalanced Sentiment Classification 1826 , 2011 .

[9]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[10]  Nlp Lab Multi-Domain Sentiment Classification with Classifier Combination , 2011 .

[11]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[12]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[13]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[14]  Elena Lloret,et al.  Towards Building a Competitive Opinion Summarization System: Challenges and Keys , 2009, HLT-NAACL.

[15]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[16]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[17]  Min Zhang,et al.  A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval , 2008, SIGIR '08.

[18]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.