The Sense and Non-Sense of Holdout Sample Validation in the Presence of Endogeneity

Market response models based on field-generated data need to address potential endogeneity in the regressors to obtain consistent parameter estimates. Another requirement is that market response models predict well in a holdout sample. With both requirements combined, it may seem reasonable to subject an endogeneity-corrected model to a holdout prediction task, and this is quite common in the academic marketing literature. One may be inclined to expect that the consistent parameter estimates obtained via instrumental variables IV estimation predict better than the biased ordinary least squares OLS estimates. This paper shows that this expectation is incorrect. That is, if the holdout sample is similar to the estimation sample so that the regressors are endogenous in both samples, holdout sample validation favors regression estimates that are not corrected for endogeneity i.e., OLS over estimates that are corrected for endogeneity i.e., IV estimation. We also discuss ways in which holdout samples may be used sensibly in the presence of endogeneity. A key takeaway is that if consistent parameter estimates are the primary model objective, the model should be validated with an exogenous rather than endogenous holdout sample.

[1]  Philip Hans Franses,et al.  On the Use of Econometric Models for Policy Simulation in Marketing , 2005 .

[2]  J. MacKinnon,et al.  Estimation and inference in econometrics , 1994 .

[3]  Praveen K. Kopalle,et al.  Predicting Competitive Response to a Major Policy Change: Combining Game-Theoretic and Empirical Analyses , 2005 .

[4]  Rick L. Andrews,et al.  Multi-stage purchase decision models: Accommodating response heterogeneity, common demand shocks, and endogeneity using disaggregate data , 2009 .

[5]  Steven T. Berry Estimating Discrete-Choice Models of Product Differentiation , 1994 .

[6]  Marc Fischer,et al.  Patient-or Physician-Oriented Marketing: What Drives Primary Demand for Prescription Drugs? , 2010 .

[7]  Dominique M. Hanssens,et al.  Market Response Models: Econometric and Time Series Analysis , 1989 .

[8]  Richard A. Briesch,et al.  Treating Zero Brand Sales Observations in Choice Model Estimation: Consequences and Potential Remedies , 2008 .

[9]  Peter J. Danaher,et al.  The Effect of Competitive Advertising Interference on Sales for Packaged Goods , 2008 .

[10]  Sumit K. Majumdar,et al.  Wearout Effects of Different Advertising Themes: A Dynamic Bayesian Model of the Advertising-Sales Relationship , 2007 .

[11]  W. Reinartz,et al.  Performance Implications of Adopting a Customer-Focused Sales Campaign , 2008 .

[12]  Ting Zhu,et al.  Market Structure and Competition in the Retail Discount Industry , 2009 .

[13]  T. Richards A nested logit model of strategic promotion , 2007 .

[14]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[15]  Sridhar Narayanan,et al.  Return on Investment Implications for Pharmaceutical Promotional Expenditures: The Role of Marketing-Mix Interactions , 2004 .

[16]  Michael R. Hagerty,et al.  Comparing the predictive powers of alternative multiple regression models , 1991 .

[17]  Eric Zivot,et al.  Bayesian and Classical Approaches to Instrumental Variables Regression , 2003 .

[18]  Tammo H. A. Bijmolt,et al.  New Empirical Generalizations on the Determinants of Price Elasticity , 2005 .

[19]  Rajkumar Venkatesan,et al.  A Customer Lifetime Value Framework for Customer Selection and Resource Allocation Strategy , 2004 .

[20]  Puneet Manchanda,et al.  Quantifying the Benefits of Individual-Level Targeting in the Presence of Firm Strategic Behavior , 2009 .

[21]  W. Greene,et al.  计量经济分析 = Econometric analysis , 2009 .

[22]  Scott A. Neslin,et al.  A Market Response Model for Coupon Promotions , 1990 .

[23]  J. M. Villas-Boas,et al.  Endogeneity in Brand Choice Models , 1999 .

[24]  Steven M. Shugan Endogeneity in Marketing Decision Models , 2004 .

[25]  Rohit Gulati,et al.  Practice Prize Paper - Marketing-Mix Recommendations to Manage Value Growth at P&G Asia-Pacific , 2009, Mark. Sci..

[26]  D. Wittink,et al.  Building Models for Marketing Decisions , 2000 .

[27]  Peter E. Rossi,et al.  Marketing models of consumer heterogeneity , 1998 .

[28]  Jeffrey M. Wooldridge,et al.  Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data , 2003 .

[29]  Rajkumar Venkatesan,et al.  Optimal Customer Relationship Management Using Bayesian Decision Theory: An Application for Customer Selection , 2007 .

[30]  Xavier Drèze,et al.  Do Promotions Increase Store Expenditures? A Descriptive Study of Household Shopping Behavior , 2004 .

[31]  Pradeep K. Chintagunta,et al.  Endogeneity and Heterogeneity in a Probit Demand Model: Estimation Using Aggregate Data , 2001 .

[32]  T. Ferguson A Course in Large Sample Theory , 1996 .

[33]  Joel H. Steckel,et al.  Cross-Validating Regression Models in Marketing Research , 1993 .

[34]  Galit Shmueli,et al.  To Explain or To Predict? , 2010 .

[35]  Peter E. Rossi,et al.  Bayesian Statistics and Marketing , 2005 .

[36]  Dennis Fok,et al.  Interaction Between Shelf Layout and Marketing Effectiveness and Its Impact on Optimizing Shelf Arrangements , 2006, Mark. Sci..

[37]  Pradeep Chintagunta,et al.  Measuring Cross-Category Price Effects with Aggregate Store Data , 2006, Manag. Sci..

[38]  Peter E. Kennedy A Guide to Econometrics , 1979 .

[39]  Vijay Mahajan,et al.  Unobserved Retailer Behavior in Multimarket Data: Joint Spatial Dependence in Market Shares and Promotion Variables , 2001 .

[40]  Steven M. Shugan Commentary - Relevancy Is Robust Prediction, Not Alleged Realism , 2009, Mark. Sci..

[41]  Peter E. Rossi,et al.  Response Modeling with Nonrandom Marketing-Mix Variables , 2004 .

[42]  Harald J. van Heerde,et al.  Marketing Models and the Lucas Critique , 2004 .

[43]  Peter E. Rossi,et al.  Bayesian Statistics and Marketing: Rossi/Bayesian Statistics and Marketing , 2006 .

[44]  Stephen J. Hoch,et al.  EDLP, Hi-Lo, and Margin Arithmetic , 1994 .

[45]  Baohong Sun,et al.  Internet Auction Features as Quality Signals. , 2009 .

[46]  Pradeep K. Chintagunta,et al.  Estimating a Stockkeeping-Unit-Level Brand Choice Model that Combines Household Panel Data and Store Data , 2005 .

[47]  Marno Verbeek,et al.  A Guide to Modern Econometrics , 2000 .

[48]  Greg M. Allenby Cross-Validation, the Bayes Theorem, and Small-Sample Bias , 1990 .

[49]  J. Angrist,et al.  Two-Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity , 1995 .

[50]  David Besanko,et al.  Logit Demand Estimation Under Competitive Pricing Behavior: An Equilibrium Framework , 1998 .

[51]  Harald J. van Heerde,et al.  Similarity-Based Spatial Methods to Estimate Shelf Space Elasticities , 2004 .

[52]  Vineet Padmanabhan,et al.  An Econometric Model of Location and Pricing in the Gasoline Market , 2007 .

[53]  K. Train,et al.  A Control Function Approach to Endogeneity in Consumer Choice Models , 2010 .

[54]  Russell S. Winer,et al.  Cross-Validation for Prediction , 1987 .

[55]  Eric W. K. Tsang Commentary - Assumptions, Explanation, and Prediction in Marketing Science: "It's the Findings, Stupid, Not the Assumptions" , 2009, Mark. Sci..

[56]  Tammo H. A. Bijmolt,et al.  Do Loyalty Programs Really Enhance Behavioral Loyalty? An Empirical Analysis Accounting for Self-Selecting Members , 2006 .