Mining Online Reviews for Predicting Sales Performance: A Case Study in the Movie Domain

Posting reviews online has become an increasingly popular way for people to express opinions and sentiments toward the products bought or services received. Analyzing the large volume of online reviews available would produce useful actionable knowledge that could be of economic values to vendors and other interested parties. In this paper, we conduct a case study in the movie domain, and tackle the problem of mining reviews for predicting product sales performance. Our analysis shows that both the sentiments expressed in the reviews and the quality of the reviews have a significant impact on the future sales performance of products in question. For the sentiment factor, we propose Sentiment PLSA (S-PLSA), in which a review is considered as a document generated by a number of hidden sentiment factors, in order to capture the complex nature of sentiments. Training an S-PLSA model enables us to obtain a succinct summary of the sentiment information embedded in the reviews. Based on S-PLSFA, we propose ARSA, an Autoregressive Sentiment-Aware model for sales prediction. We then seek to further improve the accuracy of prediction by considering the quality factor, with a focus on predicting the quality of a review in the absence of user-supplied indicators, and present ARSQA, an Autoregressive Sentiment and Quality Aware model, to utilize sentiments and quality for predicting product sales performance. Extensive experiments conducted on a large movie data set confirm the effectiveness of the proposed approach.

[1]  Zhu Zhang,et al.  Utility scoring of product reviews , 2006, CIKM '06.

[2]  W. Enders Applied Econometric Time Series , 1994 .

[3]  Wolfgang Jank,et al.  Dynamic, real-time forecasting of online auctions via functional models , 2006, KDD '06.

[4]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[5]  David Taniar,et al.  Domain-Driven, Actionable Knowledge Discovery , 2007, IEEE Intelligent Systems.

[6]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[7]  Xiaohui Yu,et al.  ARSA: a sentiment-aware model for predicting sales performance using blogs , 2007, SIGIR.

[8]  Thomas Hofmann,et al.  Unifying collaborative and content-based filtering , 2004, ICML.

[9]  J. Kamps,et al.  Words with attitude , 2002 .

[10]  Chrysanthos Dellarocas,et al.  Exploring the value of online product reviews in forecasting sales: The case of motion pictures , 2007 .

[11]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[12]  Wolfgang Jank,et al.  Research Note - Prerelease Demand Forecasting for Motion Pictures Using Functional Shape Analysis of Virtual Stock Markets , 2010, Mark. Sci..

[13]  Xiaohui Yu,et al.  Modeling and Predicting the Helpfulness of Online Reviews , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[14]  Panagiotis G. Ipeirotis,et al.  Designing novel review ranking systems: predicting the usefulness and impact of reviews , 2007, ICEC.

[15]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[16]  Ming Zhou,et al.  Low-Quality Product Review Detection in Opinion Summarization , 2007, EMNLP.

[17]  Chengqi Zhang,et al.  Flexible Frameworks for Actionable Knowledge Discovery , 2010, IEEE Transactions on Knowledge and Data Engineering.

[18]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[19]  S. Rosen Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition , 1974, Journal of Political Economy.

[20]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[21]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[25]  Xiangji Huang,et al.  Blog Data Mining: The Predictive Power of Sentiments , 2009 .

[26]  Panagiotis G. Ipeirotis,et al.  Show me the money!: deriving the pricing power of product features by mining consumer reviews , 2007, KDD '07.

[27]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[28]  David M. Pennock,et al.  Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments , 2001, UAI.

[29]  Longbing Cao Domain Driven Data Mining (D3M) , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[30]  Ramanathan V. Guha,et al.  The predictive power of online chatter , 2005, KDD '05.

[31]  Xin Jin,et al.  A maximum entropy web recommendation system: combining collaborative and content features , 2005, KDD '05.

[32]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[33]  Wolfgang Jank,et al.  The Wisdom of Crowds: Pre-Release Forecasting via Functional Shape Analysis Of the Online Virtual Stock Market , 2007 .

[34]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.