Identifying Sentiment Words Using an Optimization Model with L1 Regularization

Sentiment word identification is a fundamental work in numerous applications of sentiment analysis and opinion mining, such as review mining, opinion holder finding, and twitter classification. In this paper, we propose an optimization model with L1 regularization, called ISOMER, for identifying the sentiment words from the corpus. Our model can employ both seed words and documents with sentiment labels, different from most existing researches adopting seed words only. The L1 penalty in the objective function yields a sparse solution since most candidate words have no sentiment. The experiments on the real datasets show that ISOMER outperforms the classic approaches, and that the lexicon learned by ISOMER can be effectively adapted to document-level sentiment analysis.

[1]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[2]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[3]  Masaru Kitsuregawa,et al.  Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents , 2007, EMNLP.

[4]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[5]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Hongliang Yu,et al.  Identifying Sentiment Words Using an Optimization-based Model without Seed Words , 2013, ACL.

[8]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[9]  Yue Lu,et al.  Automatic construction of a context-aware sentiment lexicon: an optimization approach , 2011, WWW.

[10]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[11]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[12]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[13]  Claire Cardie,et al.  Identifying Expressions of Opinion in Context , 2007, IJCAI.

[14]  Tat-Seng Chua,et al.  Mining slang and urban opinion words and phrases from cQA services: an optimization approach , 2012, WSDM '12.

[15]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[16]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[17]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[18]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[19]  Amit P. Sheth,et al.  Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter , 2012, ICWSM.

[20]  Dragomir R. Radev,et al.  Identifying Text Polarity Using Random Walks , 2010, ACL.

[21]  Dragomir R. Radev,et al.  Identifying the Semantic Orientation of Foreign Words , 2011, ACL.

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Takashi Inui,et al.  Extracting Semantic Orientations of Words using Spin Model , 2005, ACL.

[24]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[25]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .