Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents

Recognizing polarity requires a list of polar words and phrases. For the purpose of building such lexicon automatically, a lot of studies have investigated (semi-) unsupervised method of learning polarity of words and phrases. In this paper, we explore to use structural clues that can extract polar sentences from Japanese HTML documents, and build lexicon from the extracted polar sentences. The key idea is to develop the structural clues so that it achieves extremely high precision at the cost of recall. In order to compensate for the low recall, we used massive collection of HTML documents. Thus, we could prepare enough polar sentence corpus.

[1]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[2]  Hiroshi Kanayama,et al.  Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis , 2006, EMNLP.

[3]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[4]  Andrea Esuli,et al.  Determining the semantic orientation of terms through gloss classification , 2005, CIKM '05.

[5]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[6]  Takashi Inui,et al.  Extracting Semantic Orientations of Words using Spin Model , 2005, ACL.

[7]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[8]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[9]  Takashi Inui,et al.  Latent Variable Models for Semantic Orientations of Phrases , 2006, EACL.

[10]  Yuji Matsumoto,et al.  Collecting Evaluative Expressions for Opinion Extraction , 2004, IJCNLP.

[11]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[12]  Soo-Min Kim,et al.  Automatic Identification of Pro and Con Reasons in Online Reviews , 2006, ACL.

[13]  Andrea Esuli,et al.  Determining Term Subjectivity and Term Orientation for Opinion Mining , 2006, EACL.

[14]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[15]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[16]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[17]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[18]  Michael L. Littman,et al.  Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.

[19]  Masaru Kitsuregawa,et al.  Automatic Construction of Polarity-Tagged Corpus from HTML Documents , 2006, ACL.

[20]  Andrea Esuli,et al.  Determining the semantic orientation of terms through gloss analysis , 2005, CIKM 2005.