No Customer Left Behind: A Distribution-Free Bayesian Approach to Accounting for Missing Xs in Marketing Models

In marketing applications, it is common that some key covariates in a regression model, such as marketing mix variables or consumer profiles, are subject to missingness. The convenient method that excludes the consumers with missingness in any covariate can result in a substantial loss of efficiency and may lead to strong selection bias in the estimation of consumer preferences and sensitivities. To solve these problems, we propose a new Bayesian distribution-free approach, which can ensure that no customer is left behind in the analysis as a result of missing covariates. In this way, all customers are being considered in devising managerial policies. The proposed approach allows for flexible modeling of a joint distribution of multidimensional interrelated covariates that can contain both continuous and discrete variables. At the same time, it minimizes the impact of distributional assumptions involved in covariate modeling because the method does not require researchers to specify parametric distributions for covariates and can automatically generate suitable distributions for missing covariates. We have developed an efficient Markov chain Monte Carlo algorithm for inference. Besides robustness and flexibility, the proposed approach reduces modeling and computational efforts associated with missing covariates and therefore makes the missing covariate problems easier to handle. We evaluate the performance of the proposed method using extensive simulation studies. We then illustrate the method in two real data examples in which missing covariates occur: a mixed multinomial logit discrete-choice model in a ketchup data set and a hierarchical probit purchase incidence model in a retail store data set. These analyses demonstrate that the proposed method overcomes several important limitations of existing approaches for solving missing covariate problems and offers opportunities to make better managerial decisions with the current available marketing databases. Although our applications focus on consumer-level data, the proposed method is general and can be applied to other marketing applications where other types of marketing players are the units of analysis.

[1]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[2]  Michel Wedel,et al.  Factor Analysis and Missing Data , 2000 .

[3]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[4]  Fred M. Feinberg,et al.  Reality Check: Combining Choice Experiments with Market Data to Estimate the Importance of Product Attributes , 2010, Manag. Sci..

[5]  Michel Wedel,et al.  Implications for Asymmetry, Nonproportionality, and Heterogeneity in Brand Switching from Piece-wise Exponential Mixture Hazard Models , 1995 .

[6]  Peter E. Rossi,et al.  A Bayesian Approach to Estimating Household Parameters , 1993 .

[7]  Eric T. Bradlow,et al.  A hierarchical latent variable model for ordinal data from a customer satisfaction survey with no answer responses , 1999 .

[8]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[9]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[10]  Pradeep K. Chintagunta,et al.  Investigating Heterogeneity in Brand Preferences in Logit Models for Panel Data , 1991 .

[11]  R. G. Miller,et al.  What price Kaplan-Meier? , 1983, Biometrics.

[12]  Robert C. Blattberg,et al.  Market Entry and Consumer Behavior: An Investigation of a Wal-Mart Supercenter , 2006 .

[13]  Yi Qian,et al.  Do National Patent Laws Stimulate Domestic Innovation in a Global Patenting Environment? A Cross-Country Analysis of Pharmaceutical Patent Protection, 19782002 , 2007, The Review of Economics and Statistics.

[14]  Sunil Gupta,et al.  Stochastic Models of Interpurchase Time with Time-Dependent Covariates , 1991 .

[15]  Pradeep K. Chintagunta,et al.  The Proportional Hazard Model for Purchase Timing , 2003 .

[16]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[17]  M. Newton,et al.  Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity , 2006 .

[18]  Eric T. Bradlow,et al.  A Learning-Based Model for Imputing Missing Levels in Partial Conjoint Profiles , 2004 .

[19]  Yi Zhao,et al.  Modeling the Under Reporting Bias in Panel Survey Data , 2009, Mark. Sci..

[20]  Tülin Erdem,et al.  Missing price and coupon availability data in scanner panels: Correcting for the self-selection bias in choice model parameters , 1998 .

[21]  Füsun F. Gönül,et al.  Modeling Multiple Sources of Heterogeneity in Multinomial Logit Models: Methodological and Managerial Issues , 1993 .

[22]  Gary J. Russell,et al.  A Probabilistic Choice Model for Market Segmentation and Elasticity Structure , 1989 .

[23]  Greg M. Allenby,et al.  Modeling Household Purchase Behavior with Logistic Normal Regression , 1994 .

[24]  S. Chib,et al.  Analysis of multivariate probit models , 1998 .

[25]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[26]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[27]  Robert C. Blattberg,et al.  Database Marketing: Analyzing and Managing Customers , 2008 .

[28]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[29]  John Geweke,et al.  Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments , 1991 .

[30]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[31]  Peter E. Rossi,et al.  A Direct Approach to Data Fusion , 2004 .

[32]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[33]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[34]  J. M. Sanz-Serna,et al.  Optimal tuning of the hybrid Monte Carlo algorithm , 2010, 1001.4460.

[35]  Wagner A. Kamakura,et al.  Statistical Data Fusion for Cross-Tabulation , 1997 .

[36]  A. Rotnitzky,et al.  Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis by DANIELS, M. J. and HOGAN, J. W , 2009 .

[37]  Peter E. Rossi,et al.  Bayesian Statistics and Marketing: Rossi/Bayesian Statistics and Marketing , 2006 .

[38]  Romana Khan,et al.  Dynamic Customer Management and the Value of One-to-One Marketing , 2009, Mark. Sci..

[39]  Peter E. Rossi,et al.  Bayesian Statistics and Marketing , 2005 .

[40]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[41]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[42]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[43]  John D. C. Little,et al.  A Logit Model of Brand Choice Calibrated on Scanner Data , 2011, Mark. Sci..

[44]  Michel Wedel,et al.  Leveraging Missing Ratings to Improve Online Recommendation Systems , 2006 .

[45]  Yi Qian,et al.  Measuring the Impact of Nonignorability in Panel Data with Non-Monotone Nonresponse , 2012 .

[46]  Eric T. Bradlow,et al.  Who's Got the Coupon? Estimating Consumer Preferences and Coupon Usage from Aggregate Information , 2008 .

[47]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[48]  H. Y. Chen Nonparametric and Semiparametric Models for Missing Covariates in Parametric Regression , 2004 .

[49]  Jeongwen Chiang,et al.  Competing Coupon Promotions and Category Sales , 1995 .

[50]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[51]  Peter E. Rossi,et al.  Marketing models of consumer heterogeneity , 1998 .

[52]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[53]  J. Q. Smith,et al.  1. Bayesian Statistics 4 , 1993 .