Mining the peanut gallery: opinion extraction and semantic classification of product reviews

The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good). We begin by identifying the unique properties of this problem and develop a method for automatically distinguishing between positive and negative reviews. Our classifier draws on information retrieval techniques for feature extraction and scoring, and the results for various metrics and heuristics vary depending on the testing situation. The best methods work as well as or better than traditional machine learning. When operating on individual sentences collected from web searches, performance is limited due to noise and ambiguity. But in the context of a complete web-based tool and aided by a simple method for grouping sentences into attributes, the results are qualitatively quite useful.

[1]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[2]  Marti A. Hearst Direction-based text interpretation as an information access refinement , 1992 .

[3]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[4]  Yoram Singer,et al.  Beyond Word N-Grams , 1996, VLC@ACL.

[5]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[6]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[7]  Loren Terveen,et al.  PHOAKS: a system for sharing recommendations , 1997, CACM.

[8]  Raymond J. Mooney and Paul N. Bennett and Loriene Roy Book Recommending Using Text Categorization with Extracted Information , 1998 .

[9]  Dunja Mladenic,et al.  Feature Subset Selection in Text-Learning , 1998, ECML.

[10]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[11]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[12]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[13]  Pero Subasic,et al.  Affect analysis of text using fuzzy semantic typing , 2001, IEEE Trans. Fuzzy Syst..

[14]  Janyce Wiebe,et al.  A Corpus Study of Evaluative and Speculative Language , 2001, SIGDIAL Workshop.

[15]  Mike Y. Chen,et al.  Yahoo! For Amazon: Sentiment Parsing from Small Talk on the Web , 2001 .

[16]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.

[17]  William A. Gale,et al.  Good-Turing Smoothing Without Tears , 2001 .

[18]  Janyce Wiebe,et al.  Identifying Collocations for Recognizing Opinions , 2001 .

[19]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[20]  N. Kushmerick,et al.  Genre Classification and Domain Transfer for Information Filtering , 2002, ECIR.

[21]  Satoshi Morinaga,et al.  Mining product reputations on the Web , 2002, KDD.

[22]  Michael L. Littman,et al.  Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.

[23]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[24]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[25]  Dekang Lin Automatic Retrieval and Clustering of Similar Words , 2022, COLING.