A Cross-corpus Study of Unsupervised Subjectivity Identification based on Calibrated EM

In this study we investigate using an unsupervised generative learning method for subjectivity detection in text across different domains. We create an initial training set using simple lexicon information, and then evaluate a calibrated EM (expectation-maximization) method to learn from unannotated data. We evaluate this unsupervised learning approach on three different domains: movie data, news resource, and meeting dialogues. We also perform a thorough analysis to examine impacting factors on unsupervised learning, such as the size and self-labeling accuracy of the initial training set. Our experiments and analysis show inherent differences across domains and performance gain from calibration in EM.

[1]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[2]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[3]  ThrunSebastian,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000 .

[4]  Giuseppe Carenini,et al.  Summarizing Spoken and Written Conversations , 2008, EMNLP.

[5]  Giuseppe Carenini,et al.  Detecting subjectivity in multiparty speech , 2009, INTERSPEECH.

[6]  Wessel Kraaij,et al.  A Shallow Approach to Subjectivity Classification , 2008, ICWSM.

[7]  Vincent Ng,et al.  Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews , 2006, ACL.

[8]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[9]  Jun'ichi Tsujii,et al.  Training a Naive Bayes Classifier via the EM Algorithm with a Class Distribution Constraint , 2003, CoNLL.

[10]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[11]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[12]  Bo Pang,et al.  Using Very Simple Statistics for Review Search: An Exploration , 2008, COLING.

[13]  Chu-Ren Huang,et al.  Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification , 2010, ACL.

[14]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[15]  Janyce Wiebe,et al.  Annotating Opinions in the World Press , 2003, SIGDIAL Workshop.

[16]  Theresa Wilson,et al.  Multimodal Subjectivity Analysis of Multiparty Conversation , 2008, EMNLP.

[17]  Vincent Ng,et al.  Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification , 2009, ACL.

[18]  Theresa Wilson,et al.  Annotating Subjective Content in Meetings , 2008, LREC.

[19]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[20]  Soo-Min Kim,et al.  Automatic Detection of Opinion Bearing Words and Sentences , 2005, IJCNLP.