An experimental investigation of scanner data preparation strategies for consumer choice models

Abstract Over the past two decades, marketing scientists in academia and industry have employed consumer choice models calibrated using supermarket scanner data to assess the impact of price and promotion on consumer choice, and they continue to do so today. Despite the extensive usage of scanner panel data for choice modeling, very little is known about the impact of data preparation strategies on the results of modeling efforts. In most cases, scanner panel data is pruned prior to model estimation to eliminate less significant brands, sizes, product forms, etc., as well as households with purchase histories not long enough to provide information on key consumer behavior concepts such as loyalty, variety seeking, and brand consideration. Further, product entity aggregation is usually part of data preparation also since hundreds of SKUs are available as choice alternatives in many product categories. This study conducts an extensive simulation experiment to investigate the effects of data pruning and entity aggregation strategies on estimated price and promotion sensitivities. Characteristics of the data that may moderate the effects of data preparation strategies are also manipulated. The results show that data preparation strategies can result in significant bias in estimated parameters. Based on the results, we make recommendations on how the model builder can prepare scanner panel data so as to avoid significant biases in estimated price and promotion responses.

[1]  Peter S. Fader,et al.  Modeling Consumer Choice among SKUs , 1996 .

[2]  A. W. Kemp,et al.  The Dirichlet: A comprehensive model of buying behaviour , 1984 .

[3]  Pradeep K. Chintagunta,et al.  Do Household Scanner Panel Data Provide Representative Inferences from Brand Choices , 1996 .

[4]  Rick L. Andrews,et al.  An Empirical Comparison of Logit Choice Models with Discrete versus Continuous Representations of Heterogeneity , 2002 .

[5]  C. Bhat Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model , 2001 .

[6]  D. McFadden,et al.  AN APPLICATION OF DIAGNOSTIC TESTS FOR THE INDEPENDENCE FROM IRRELEVANT ALTERNATIVES PROPERTY OF THE MULTINOMIAL LOGIT MODEL , 1977 .

[7]  D. McFadden,et al.  Specification tests for the multinomial logit model , 1984 .

[8]  U. Böckenholt Estimating latent distributions in recurrent choice data , 1993 .

[9]  Pradeep K. Chintagunta,et al.  Investigating Household State Dependence Effects across Categories , 1999 .

[10]  Sunil Gupta Impact of Sales Promotions on when, what, and how Much to Buy , 1988 .

[11]  S. Siddarth,et al.  Determining Segmentation in Sales Response across Consumer Purchase Behaviors , 1998 .

[12]  Ajay K. Manrai,et al.  Mds Maps for Product Attributes and Market Response: An Application to Scanner Panel Data , 1999 .

[13]  K. Train Discrete Choice Methods with Simulation , 2003 .

[14]  Donald G. Morrison,et al.  Making the Cut: Modeling and Analyzing Choice Set Restriction in Scanner Panel Data , 1995 .

[15]  Pradeep K. Chintagunta,et al.  Heterogeneous Logit Model Implications for Brand Positioning , 1994 .

[16]  Robert J. Meyer,et al.  Disaggregate Tree-Structured Modeling of Consumer Choice Data , 1988 .

[17]  Teck-Hua Ho,et al.  A Parsimonious Model of Stockkeeping-Unit Choice , 2003 .

[18]  Peter E. Rossi,et al.  Purchase frequency, sample selection, and price sensitivity: The heavy-user bias , 1994 .

[19]  Bart J. Bronnenberg,et al.  Limited Choice Sets, Local Price Response, and Implied Measures of Price Competition , 1996 .

[20]  M. Keane,et al.  Decision-Making Under Uncertainty: Capturing Dynamic Brand Choice Processes in Turbulent Consumer Goods Markets , 1996 .

[21]  Pradeep K. Chintagunta,et al.  A Framework for Investigating Habits, “The Hand of the Past,” and Heterogeneity in Dynamic Brand Choice , 1996 .

[22]  Rick L. Andrews,et al.  Studying Consideration Effects in Empirical Choice Models Using Scanner Panel Data , 1995 .

[23]  Markus Christen,et al.  Using Market-Level Data to Understand Promotion Effects in a Nonlinear Model , 1997 .

[24]  D. McFadden,et al.  MIXED MNL MODELS FOR DISCRETE RESPONSE , 2000 .

[25]  M. Wedel,et al.  Metric Conjoint Segmentation Methods: A Monte Carlo Comparison , 1996 .

[26]  Dick R. Wittink,et al.  Do Household Scanner Data Provide Representative Inferences from Brand Choices: A Comparison with Store Data , 1996 .

[27]  W. Kamakura,et al.  Modeling Preference and Structural Heterogeneity in Consumer Choice , 1996 .

[28]  Markus Christen,et al.  Using market-level data to understand nonlinear promotion effects , 1997 .

[29]  Rick L. Andrews,et al.  Hierarchical Bayes versus Finite Mixture Conjoint Analysis Models: A Comparison of Fit, Prediction, and Partworth Recovery , 2002 .

[30]  Dipak C. Jain,et al.  A Random-Coefficients Logit Brand-Choice Model Applied to Panel Data , 1994 .

[31]  Pradeep K. Chintagunta,et al.  Investigating Heterogeneity in Brand Preferences in Logit Models for Panel Data , 1991 .

[32]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[33]  Rick L. Andrews,et al.  Identifying segments with identical choice behaviors across product categories: An Intercategory Logit Mixture model , 2002 .

[34]  Tülin Erdem A Dynamic Analysis of Market Structure Based on Panel Data , 1996 .

[35]  Füsun F. Gönül,et al.  Modeling Multiple Sources of Heterogeneity in Multinomial Logit Models: Methodological and Managerial Issues , 1993 .

[36]  Michel Wedel,et al.  A Comparison of Multidimensional Scaling Methods for Perceptual Mapping , 1999 .

[37]  Gary J. Russell,et al.  A Probabilistic Choice Model for Market Segmentation and Elasticity Structure , 1989 .

[38]  Steven R. Lerman,et al.  The Estimation of Choice Probabilities from Choice Based Samples , 1977 .

[39]  Pradeep K. Chintagunta,et al.  On Using Demographic Variables to Determine Segment Membership in Logit Mixture Models , 1994 .