Mining and summarizing customer reviews

Merchants selling products on the Web often ask their customers to review the products that they have purchased and the associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the same product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.

[1]  J. Jenkins,et al.  Word association norms , 1964 .

[2]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[3]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[4]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[5]  Marti A. Hearst Direction-based text interpretation as an information access refinement , 1992 .

[6]  Karen Spärck Jones What Might be in a Summary? , 1993, Information Retrieval.

[7]  Jussi Karlgren,et al.  Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[8]  Béatrice Daille,et al.  Study and Implementation of Combined Techniques for Automatic Extraction of Terminology , 1994 .

[9]  Warren Sack,et al.  On the Computation of Point of View , 1994, AAAI.

[10]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[11]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[12]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[13]  Gerard Salton,et al.  Automatic text decomposition using text segments and text themes , 1996, HYPERTEXT '96.

[14]  Judith L. Klavans,et al.  Book Reviews: The Balancing Act: Combining Symbolic and Statistical Approaches to Language , 1997, CL.

[15]  Branimir K. Boguraev,et al.  Salience-based Content Characterisafion of Text Documents , 1997 .

[16]  Inderjeet Mani,et al.  Multi-Document Summarization by Graph Search and Matching , 1997, AAAI/IAAI.

[17]  U. Reimer,et al.  A Formal Model of Text Summarization Based on Condensation Operators of a Terminological Logic , 1997 .

[18]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[19]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[20]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[21]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[22]  Joel R. Tetreault Analysis of Syntax-Based Pronoun Resolution Methods , 1999, ACL.

[23]  Janyce Wiebe,et al.  Recognizing subjectivity: a case study in manual tagging , 1999, Natural Language Engineering.

[24]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[25]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[26]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[27]  Alison Huettner,et al.  Fuzzy Typing for Document Management , 2000 .

[28]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[29]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[30]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[31]  N. Kushmerick,et al.  Genre Classification and Domain Transfer for Information Filtering , 2002, ECIR.

[32]  Satoshi Morinaga,et al.  Mining product reputations on the Web , 2002, KDD.

[33]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[34]  Claire Cardie,et al.  Combining Low-Level and Summary Representations of Opinions for Multi-Perspective Question Answering , 2003, New Directions in Question Answering.

[35]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[36]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[37]  Christian Jacquemin,et al.  Term Extraction and Automatic Indexing , 2005 .

[38]  Aidan Finn,et al.  Learning to classify documents according to genre , 2006, J. Assoc. Inf. Sci. Technol..