Language Independent Sentence-Level Subjectivity Analysis with Feature Selection

Identifying and extracting subjective information from News, Blogs and other user generated content has lot of applications. Most of the earlier work concentrated on English data. But, recently subjectivity related research at sentence-level in other languages has increased. In this paper, we achieve sentence-level subjectivity classification using language independent feature weighing and selection methods which are consistent across languages. Experiments performed on 5 different languages including English and South Asian language Hindi show that Entropy based category coverage difference criterion (ECCD) feature selection method with language independent feature weighing methods outperforms other approaches for subjective classification.

[1]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[2]  Mohamed S. Kamel,et al.  Knowledge and Information Systems , 2006 .

[3]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[4]  Steven Skiena,et al.  International Sentiment Analysis for News and Blogs , 2021, ICWSM.

[5]  Robert J. Hilderman,et al.  Categorical Proportional Difference: A Feature Selection Method for Text Categorization , 2008, AusDM.

[6]  Yulan He,et al.  Sentence Subjectivity Detection with Weakly-Supervised Learning , 2011, IJCNLP.

[7]  Giuseppe Carenini,et al.  Predicting Subjectivity in Multimodal Conversations , 2009, EMNLP.

[8]  LeeGary Geunbae,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006 .

[9]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[10]  Vasudeva Varma,et al.  Retrieval approach to extract opinions about people from resource scarce language news articles , 2012, WISDOM '12.

[11]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[12]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[13]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[14]  Christophe Moulin,et al.  Entropy based feature selection for text categorization , 2011, SAC.

[15]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[16]  Theresa Wilson,et al.  Comparing word, character, and phoneme n-grams for subjective utterance recognition , 2008, INTERSPEECH.

[17]  John Carroll,et al.  Unsupervised Classification of Sentiment and Objectivity in Chinese Text , 2008, IJCNLP.

[18]  Jiawei Han,et al.  Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[19]  Hiroshi Kanayama,et al.  Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis , 2006, EMNLP.

[20]  Rada Mihalcea,et al.  Multilingual Subjectivity: Are More Languages Better? , 2010, COLING.

[21]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[22]  Patricio Martínez-Barco,et al.  Opinion and Generic Question Answering Systems: a Performance Analysis , 2009, ACL/IJCNLP.

[23]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[24]  Douglas W. Oard,et al.  NTCIR-6 at Maryland: Chinese Opinion Analysis Pilot Task , 2007, NTCIR.

[25]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[26]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.