De‐Biasing the Reporting Bias in Social Media Analytics

User-generated contents (UGC) in social media such as online reviews are inherently incomplete since we do not capture the opinions of users who do not write a review. These silent users may be systematically different than those who speak up. Such differences can be driven by users’ differing sentiments toward their shopping experiences as well as their disposition to generate UGC. Overlooking silent users’ opinions can result in a reporting bias. We develop a method to model users’ UGC generating process and then rectify this bias through an inverse probability weighting (IPW) approach. In the context of users’ movie review activities at Blockbuster.com, our results show that the average probability for a customer to post a review is 0.06 when the customer is unsatisfied with a movie, 0.23 when indifferent, and 0.32 when satisfied. The distribution of user's reporting probability with positive experience first-order stochastically dominates the one with negative experience. Our approach provides a realistic solution for business managers to properly utilize incomplete UGC.

[1]  C. Bhattacharya Is your brand's loyalty too much, too little, or just right?: Explaining deviations in loyalty from the Dirichlet norm , 1997 .

[2]  David C. Schmittlein,et al.  Technical Note---Why Does the NBD Model Work? Robustness in Representing Product Purchases, Brand Purchases and Imperfectly Recorded Purchases , 1985 .

[3]  Peter S. Fader,et al.  A note on modelling underreported Poisson counts , 2000 .

[4]  Hila Etzion,et al.  Pump up the Volume? Examining The Relationship Between Number of Online Reviews and Sales: Is More Necessarily Better? , 2007, ICIS.

[5]  Michel Wedel,et al.  List augmentation with model based multiple imputation: a case study using a mixed‐outcome factor model , 2003 .

[6]  Michael D. Smith,et al.  All Reviews are Not Created Equal: The Disaggregate Impact of Reviews and Reviewers at Amazon.Com , 2008 .

[7]  Brad N. Greenwood,et al.  A Digital Soapbox? The Information Value of Online Physician Ratings , 2012 .

[8]  Panagiotis G. Ipeirotis,et al.  Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  David Godes,et al.  Sequential and Temporal Dynamics of Online Opinion , 2012, Mark. Sci..

[10]  Yong Liu Word-of-Mouth for Movies: Its Dynamics and Impact on Box Office Revenue , 2006 .

[11]  David C. Schmittlein,et al.  Excess Behavioral Loyalty for High-Share Brands: Deviations from the Dirichlet Model for Repeat Purchasing , 1993 .

[12]  Anindya Ghose,et al.  Examining the Relationship Between Reviews and Sales: The Role of Reviewer Identity Disclosure in Electronic Markets , 2008, Inf. Syst. Res..

[13]  Zhiqiang Zheng,et al.  Know Yourself and Know Your Enemy: An Analysis of Firm Recommendations and Consumer Reviews in a Competitive Environment , 2013, MIS Q..

[14]  A. W. Kemp,et al.  The Dirichlet: A comprehensive model of buying behaviour , 1984 .

[15]  Ian McAllister,et al.  Bandwagon, Underdog, or Projection? Opinion Polls and Electoral Choice in Britain, 1979-1987 , 1991, The Journal of Politics.

[16]  Ann E. Schlosser Posting versus Lurking: Communicating in a Multiple Audience Context , 2005 .

[17]  Manfred Gärtner Endogenous Bandwagon and Underdog Effects in a Rational Choice Model , 1976 .

[18]  Bin Gu,et al.  Do online reviews matter? - An empirical investigation of panel data , 2008, Decis. Support Syst..

[19]  Carl F. Mela,et al.  Size and Share of Customer Wallet , 2007 .

[20]  Steven G. Allen,et al.  An Empirical Model of Work Attendance , 1981 .

[21]  Peter E. Rossi Invited Paper - Even the Rich Can Make Themselves Poor: A Critical Examination of IV Methods in Marketing Applications , 2014, Mark. Sci..

[22]  Michel Wedel,et al.  Leveraging Missing Ratings to Improve Online Recommendation Systems , 2006 .

[23]  E. Clemons,et al.  When Online Reviews Meet Hyperdifferentiation: A Study of the Craft Beer Industry , 2006 .

[24]  J. Heckman Sample selection bias as a specification error , 1979 .

[25]  David A. Schweidel,et al.  Online Product Opinions: Incidence, Evaluation, and Evolution , 2012, Mark. Sci..

[26]  David A. Schweidel,et al.  Incorporating Direct Marketing Activity into Latent Attrition Models , 2013, Mark. Sci..

[27]  Chrysanthos Dellarocas,et al.  The Sound of Silence in Online Feedback: Estimating Trading Risks in the Presence of Reporting Bias , 2006, Manag. Sci..

[28]  Lorin M. Hitt,et al.  Self Selection and Information Role of Online Product Reviews , 2007, Inf. Syst. Res..

[29]  Rainer Winkelmann,et al.  Markov chain Monte Carlo analysis of underreported count data with an application to worker absenteeism , 1996 .

[30]  David C. Schmittlein,et al.  Predicting Future Random Events Based on Past Performance , 1981 .

[31]  David Schuff,et al.  What Makes a Helpful Review? A Study of Customer Reviews on Amazon.com , 2010 .

[32]  Chris D. Orme,et al.  Worker Absenteeism: An Analysis Using Microdata , 1991 .

[33]  Paul A. Pavlou,et al.  Overcoming the J-shaped distribution of product reviews , 2009, CACM.