Beyond the Stars: Improving Rating Predictions using Review Text Content

Online reviews are an important asset for users deciding to buy a product, see a movie, or go to a restaurant, as well as for businesses tracking user feedback. However, most reviews are written in a free-text format, and are therefore difficult for computer systems to understand, analyze, and aggregate. One consequence of this lack of structure is that searching text reviews is often frustrating for users. User experience would be greatly improved if the structure and sentiment conveyed in the content of the reviews were taken into account. Our work focuses on identifying this information from free-form text reviews, and using the knowledge to improve user experience in accessing reviews. Specifically, we focused on improving recommendation accuracy in a restaurant review scenario. In this paper, we report on our classification effort, and on the insight on user-reviewing behavior that we gained in the process. We propose new ad-hoc and regression-based recommendation measures, that both take into account the textual component of user reviews. Our results show that using textual information results in better general or personalized review score predictions than those derived from the numerical star ratings given by the users.

[1]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[2]  Regina Barzilay,et al.  Multiple Aspect Ranking Using the Good Grief Algorithm , 2007, NAACL.

[3]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[4]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[5]  Cane Wing-ki Leung,et al.  Integrating Collaborative Filtering and Sentiment Analysis: A Rating Inference Approach , 2006 .

[6]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[7]  Panagiotis G. Ipeirotis,et al.  Show me the money!: deriving the pricing power of product features by mining consumer reviews , 2007, KDD '07.

[8]  Soo-Min Kim,et al.  Identifying and Analyzing Judgment Opinions , 2006, NAACL.

[9]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[10]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[11]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[12]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[13]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[14]  James Bennett,et al.  The Netflix Prize , 2007 .

[15]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[16]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[17]  L. Toothaker Book Review : Nonparametric Statistics for the Behavioral Sciences (Second Edition) , 1989 .

[18]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[19]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.