From Amazon to Apple: Modeling Online Retail Sales, Purchase Incidence, and Visit Behavior

In this study, we propose a multivariate stochastic model for Web site visit duration, page views, purchase incidence, and the sale amount for online retailers. The model is constructed by composition from carefully selected distributions and involves copula components. It allows for the strong nonlinear relationships between the sales and visit variables to be explored in detail, and can be used to construct sales predictions. The model is readily estimated using maximum likelihood, making it an attractive choice in practice given the large sample sizes that are commonplace in online retail studies. We examine a number of top-ranked U.S. online retailers, and find that the visit duration and the number of pages viewed are both related to sales, but in very different ways for different products. Using Bayesian methodology, we show how the model can be extended to a finite mixture model to account for consumer heterogeneity via latent household segmentation. The model can also be adjusted to accommodate a more accurate analysis of online retailers like apple.com that sell products at a very limited number of price points. In a validation study across a range of different Web sites, we find that the purchase incidence and sales amount are both forecast more accurately using our model, when compared to regression, probit regression, a popular data-mining method, and a survival model employed previously in an online retail study. Supplementary materials for this article are available online.

[1]  Catarina Sismeiro,et al.  A Model of Web Site Browsing Behavior Estimated on Clickstream Data , 2003 .

[2]  Wendy W. Moe,et al.  The Influence of Goal‐Directed and Experiential Activities on Online Flow Experiences , 2003 .

[3]  Chun-Yao Huang,et al.  Modeling the Audience's Banner ad Exposure for Internet Advertising Planning , 2006 .

[4]  H. Joe Asymptotic efficiency of the two-stage estimation method for copula-based models , 2005 .

[5]  Murray D Smith,et al.  Modeling Sample Selection Using Archimedean Copulas , 2003 .

[6]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[7]  W. DeSarbo,et al.  Market Segment Derivation and Profiling Via a Finite Mixture Model Framework , 2002 .

[8]  Pradeep K. Chintagunta,et al.  The Effect of Banner Advertising on Internet Purchasing , 2006 .

[9]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[10]  Christine M. Anderson-Cook,et al.  Book review: quantitative risk management: concepts, techniques and tools, revised edition, by A.F. McNeil, R. Frey and P. Embrechts. Princeton University Press, 2015, ISBN 978-0-691-16627-8, xix + 700 pp. , 2017, Extremes.

[11]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[12]  P. X. Song,et al.  Multivariate Dispersion Models Generated From Gaussian Copula , 2000 .

[13]  Gary J. Russell,et al.  A Probabilistic Choice Model for Market Segmentation and Elasticity Structure , 1989 .

[14]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[15]  Philip Hougaard,et al.  Life table methods for heterogeneous populations: Distributions describing the heterogeneity , 1984 .

[16]  Dani Gamerman,et al.  Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference , 1997 .

[17]  Rob J. Hyndman,et al.  A Bayesian approach to bandwidth selection for multivariate kernel density estimation , 2006, Comput. Stat. Data Anal..

[18]  T. Louis,et al.  Inferences on the association parameter in copula models for bivariate survival data. , 1995, Biometrics.

[19]  Martijn van Hasselt Bayesian inference in a sample selection model , 2011 .

[20]  Viswanath Venkatesh,et al.  Turning Visitors into Customers: A Usability-Centric Perspective on Purchase Behavior in Electronic Channels , 2006, Manag. Sci..

[21]  AgarwalRitu,et al.  Turning Visitors into Customers , 2006 .

[22]  A. Harvey,et al.  5 Stochastic volatility , 1996 .

[23]  Peter E. Rossi,et al.  Marketing models of consumer heterogeneity , 1998 .

[24]  N. Shephard,et al.  Stochastic Volatility: Likelihood Inference And Comparison With Arch Models , 1996 .

[25]  Pravin K. Trivedi,et al.  Copula Modeling: An Introduction for Practitioners , 2007 .

[26]  Michael Y. Hu,et al.  CHILDREN'S RECALL OF TELEVISION AD ELEMENTS: An Examination of Audiovisual Effects , 2006 .

[27]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[28]  Gerald L. Lohse,et al.  Cognitive Lock-In and the Power Law of Practice , 2003 .

[29]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[30]  M. King,et al.  A Bayesian approach to bandwidth selection for multivariate kernel regression with an application to state-price density estimation , 2009 .

[31]  Kannan Srinivasan,et al.  Modeling Online Browsing and Path Analysis Using Clickstream Data , 2004 .

[32]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[33]  Dirk Van den Poel,et al.  Predicting online-purchasing behaviour , 2005, Eur. J. Oper. Res..

[34]  Paul Oyer,et al.  A Theory of Sales Quotas with Limited Liability and Rent Sharing , 2000, Journal of Labor Economics.

[35]  E. Luciano,et al.  Copula Methods in Finance: Cherubini/Copula , 2004 .

[36]  Guy W. Mullarkey,et al.  Factors Affecting Web Site Visit Duration: A Cross-Domain Analysis , 2006 .

[37]  Wayne S. DeSarbo,et al.  Bayesian inference for finite mixtures of generalized linear models with random effects , 2000 .

[38]  E. Luciano,et al.  Copula methods in finance , 2004 .

[39]  Peter S. Fader,et al.  Dynamic Conversion Behavior at E-Commerce Sites , 2004, Manag. Sci..

[40]  Peter J. Danaher,et al.  Modeling Multivariate Distributions Using Copulas: Applications in Marketing , 2011, Mark. Sci..

[41]  Peter J. Danaher,et al.  Modeling Page Views Across Multiple Websites with an Application to Internet Reach and Frequency Prediction , 2007 .

[42]  M. Stephens Dealing with label switching in mixture models , 2000 .

[43]  M. Smith,et al.  Estimation of Copula Models With Discrete Margins via Bayesian Data Augmentation , 2011 .

[44]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[45]  Paul Embrechts,et al.  Quantitative Risk Management , 2011, International Encyclopedia of Statistical Science.

[46]  C. Croux,et al.  Modeling Within- and Across-Customer Association in Lifetime Value with Copulas , 2010 .

[47]  Guy W. Mullarkey,et al.  Factors Affecting Online Advertising Recall: A Study of Students , 2003, Journal of Advertising Research.

[48]  F. Vella Estimating Models with Sample Selection Bias: A Survey , 1998 .

[49]  Greg M. Allenby,et al.  Multivariate Analysis of Multiple Response Data , 2003 .

[50]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[51]  Olivia R. Liu Sheng,et al.  Is stickiness profitable for electronic retailers? , 2010, CACM.

[52]  R. Nelsen An Introduction to Copulas , 1998 .

[53]  C. Robert,et al.  Deviance information criteria for missing data models , 2006 .

[54]  S. Frühwirth-Schnatter,et al.  Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. , 2010, Biostatistics.

[55]  Sungho Park,et al.  Handling Endogenous Regressors by Joint Estimation Using Copulas , 2012, Mark. Sci..