Estimating the Socio-Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics

With the rapid growth of the Internet, the ability of users to create and publish content has created active electronic communities that provide a wealth of product information. However, the high volume of reviews that are typically published for a single product makes harder for individuals as well as manufacturers to locate the best reviews and understand the true underlying quality of a product. In this paper, we re-examine the impact of reviews on economic outcomes like product sales and see how different factors affect social outcomes like the extent of their perceived usefulness. Our approach explores multiple aspects of review text, such as lexical, grammatical, semantic, and stylistic levels to identify important text-based features. In addition, we also examine multiple reviewer-level features such as average usefulness of past reviews and the self-disclosed identity measures of reviewers that are displayed next to a review. Our econometric analysis reveals that the extent of subjectivity, informativeness, readability, and linguistic correctness in reviews matters in influencing sales and perceived usefulness. Reviews that have a mixture of objective, and highly subjective sentences have a negative effect on product sales, compared to reviews that tend to include only subjective or only objective information. However, such reviews are considered more informative (or helpful) by the users. By using Random Forest based classifiers, we show that we can accurately predict the impact of reviews on sales and their perceived usefulness. Reviews for products that have received widely fluctuating reviews, also have reviews of widely fluctuating helpfulness. In particular, we find that highly detailed and readable reviews can have low helpfulness votes in cases when users tend to vote negatively not because they disapprove of the review quality but rather to convey their disapproval of the review polarity. We examine the relative importance of the three broad feature categories: ‘reviewer-related’ features, ‘review subjectivity’ features, and ‘review readability’ features, and find that using any of the three feature sets results in a statistically equivalent performance as in the case of using all available features. This paper is the first study that integrates econometric, text mining, and predictive modeling techniques toward a more complete analysis of the information captured by user-generated online reviews in order to estimate their socio-economic impact. Our results can have implications for judicious design of opinion forums.

[1]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.

[2]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[3]  Yong Liu Word-of-Mouth for Movies: Its Dynamics and Impact on Box Office Revenue , 2006 .

[4]  A. Sundararajan,et al.  Evaluating Pricing Strategy Using Ecommerce Data: Evidence and Estimation Challenges , 2006, math/0609170.

[5]  Iryna Gurevych,et al.  Predicting the perceived quality of web forum posts , 2007 .

[6]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[7]  Arun Sundararajan,et al.  Opinion Mining using Econometrics: A Case Study on Reputation Systems , 2007, ACL.

[8]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[9]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[10]  Anindya Ghose,et al.  Examining the Relationship Between Reviews and Sales: The Role of Reviewer Identity Disclosure in Electronic Markets , 2008, Inf. Syst. Res..

[11]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[12]  Zizhuo Wang,et al.  A unified framework for dynamic pari-mutuel information market design , 2009, EC '09.

[13]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[14]  Eric T. Bradlow,et al.  Automatic Construction of Conjoint Attributes and Levels from Online Customer Reviews , 2007 .

[15]  Katelyn Y. A. McKenna,et al.  Causes and Consequences of Social Interaction on the Internet: A Conceptual Framework , 1999 .

[16]  Ritu Agarwal,et al.  Through a Glass Darkly: Information Technology Design, Identity Verification, and Knowledge Contribution in Online Communities , 2007, Inf. Syst. Res..

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[19]  Sirkka L. Jarvenpaa,et al.  Communication and Trust in Global Virtual Teams , 1999, J. Comput. Mediat. Commun..

[20]  Panagiotis G. Ipeirotis,et al.  Designing novel review ranking systems: predicting the usefulness and impact of reviews , 2007, ICEC.

[21]  Christopher M. Snyder,et al.  The Influence of Expert Reviews on Consumer Demand for Experience Goods: A Case Study of Movie Critics , 2005 .

[22]  Kamal Nigam,et al.  Towards a Robust Metric of Opinion , 2004 .

[23]  Linh Hoang,et al.  A Model for Evaluating the Quality of User-Created Documents , 2008, AIRS.

[24]  Peter H. Reingen,et al.  Social Ties and Word-of-Mouth Referral Behavior , 1987 .

[25]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[26]  Panagiotis G. Ipeirotis,et al.  Designing Ranking Systems for Consumer Reviews : The Impact of Review Subjectivity on Product Sales and Review Quality , 2006 .

[27]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[28]  Regina Barzilay,et al.  Multiple Aspect Ranking Using the Good Grief Algorithm , 2007, NAACL.

[29]  S. Chaiken The heuristic model of persuasion. , 1987 .

[30]  Janyce Wiebe,et al.  RECOGNIZING STRONG AND WEAK OPINION CLAUSES , 2006, Comput. Intell..

[31]  S. Chaiken Heuristic versus systematic information processing and the use of source versus message cues in persuasion. , 1980 .

[32]  Panagiotis G. Ipeirotis,et al.  Show me the money!: deriving the pricing power of product features by mining consumer reviews , 2007, KDD '07.

[33]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[34]  Paul A. Pavlou,et al.  Can online reviews reveal a product's true quality?: empirical findings and analytical modeling of Online word-of-mouth communication , 2006, EC '06.

[35]  William H. DuBay The Principles of Readability. , 2004 .

[36]  Ming Zhou,et al.  Low-Quality Product Review Detection in Opinion Summarization , 2007, EMNLP.

[37]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[38]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[39]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[40]  Andrew Whinston,et al.  The Dynamics of Online Word-of-Mouth and Product Sales: An Empirical Investigation of the Movie Industry , 2008 .

[41]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[42]  Ramanathan V. Guha,et al.  The predictive power of online chatter , 2005, KDD '05.

[43]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[44]  Dwayne D. Gremler,et al.  Electronic word-of-mouth via consumer-opinion platforms: What motivates consumers to articulate themselves on the Internet? , 2004 .

[45]  V. Dhar,et al.  Does Chatter Matter? The Impact of User-Generated Content on Music Sales , 2007 .

[46]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[47]  Zhu Zhang,et al.  Utility scoring of product reviews , 2006, CIKM '06.

[48]  R. Spears,et al.  Social influence and the influence of the 'social' in computer-mediated communication. , 1992 .

[49]  Max Mühlhäuser,et al.  Automatically Assessing the Post Quality in Online Discussions on Software , 2007, ACL.

[50]  R. Bagozzi,et al.  A Social Influence Model of Consumer Participation in Network- and Small-Group-Based Virtual Communities , 2004 .