Fast moment‐based estimation for hierarchical models

Summary Hierarchical models allow for heterogeneous behaviours in a population while simultaneously borrowing estimation strength across all subpopulations. Unfortunately, existing likelihood-based methods for fitting hierarchical models have high computational demands, and these demands have limited their adoption in large-scale prediction and inference problems. The paper proposes a moment-based procedure for estimating the parameters of a hierarchical model which has its roots in a method originally introduced by Cochran in 1937. The method trades statistical efficiency for computational efficiency. It gives consistent parameter estimates, competitive prediction error performance and substantial computational improvements. When applied to a large-scale recommender system application and compared with a standard maximum likelihood procedure, the method delivers competitive prediction performance while reducing the sequential computation time from hours to minutes.

[1]  P. McCullagh,et al.  Bias Correction in Generalized Linear Models , 1991 .

[2]  Deepak Agarwal Statistical Challenges in Internet Advertising , 2008 .

[3]  Harvey Goldstein,et al.  Improved Approximations for Multilevel Models with Binary Responses , 1996 .

[4]  M. Schemper,et al.  A solution to the problem of separation in logistic regression , 2002, Statistics in medicine.

[5]  Mulugeta Gebregziabher,et al.  Fitting parametric random effects models in very large data sets with application to VHA national data , 2012, BMC Medical Research Methodology.

[6]  R. Jennrich,et al.  Unbalanced repeated-measures models with structured covariance matrices. , 1986, Biometrics.

[7]  J. Nelder,et al.  Hierarchical Generalized Linear Models , 1996 .

[8]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[9]  D. Rubin,et al.  Estimation in Covariance Components Models , 1981 .

[10]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Carl N. Morris,et al.  Parametric Empirical Bayes Inference: Theory and Applications: Rejoinder , 1983 .

[12]  N. Longford A FAST SCORING ALGORITHM FOR MAXIMUM LIKELIHOOD ESTIMATION IN UNBALANCED MIXED MODELS WITH NESTED RANDOM EFFECTS , 1987 .

[13]  Gregory C. Reinsel,et al.  Mean Squared Error Properties of Empirical Bayes Estimators in a Multivariate Random Effects General Linear Model , 1985 .

[14]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[15]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[16]  R. Kohli,et al.  Internet Recommendation Systems , 2000 .

[17]  Deepak Agarwal,et al.  Regression-based latent factor models , 2009, KDD.

[18]  D. Dunson,et al.  Sparse Variational Analysis of Linear Mixed Models for Large Data Sets. , 2011, Statistics and Probability Letters.

[19]  Yi Zhang,et al.  Efficient bayesian hierarchical user modeling for recommendation system , 2007, SIGIR.

[20]  Calyampudi R. Rao,et al.  The theory of least squares when the parameters are stochastic and its application to the analysis of growth curves. , 1965, Biometrika.

[21]  H. Goldstein Multilevel mixed linear model analysis using iterative generalized least squares , 1986 .

[22]  Deepak Agarwal,et al.  Fast Computation of Posterior Mode in Multi-Level Hierarchical Models , 2008, NIPS.

[23]  Patty Solomon,et al.  Components of Variance , 2002 .

[24]  Celia M. T. Greenwood,et al.  A modified score function estimator for multinomial logistic regression in small samples , 2002 .

[25]  W. G. Cochran The combination of estimates from different experiments. , 1954 .

[26]  Diane K. Michelson,et al.  Components of Variance , 2003, Technometrics.

[27]  Christian Posse,et al.  Bayesian Mixed-Effects Models for Recommender Systems , 1999 .

[28]  Michel Wedel,et al.  Challenges and opportunities in high-dimensional choice data analyses , 2008 .

[29]  W. G. Cochran Problems arising in the analysis of a series of similar experiments , 1937 .

[30]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[31]  D. Bates,et al.  Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-Measures Data , 1988 .

[32]  Douglas M. Bates,et al.  Linear mixed models and penalized least squares , 2004 .

[33]  D. Bates,et al.  Linear Mixed-Effects Models using 'Eigen' and S4 , 2015 .

[34]  Deepak Agarwal,et al.  Parallel matrix factorization for binary response , 2013, 2013 IEEE International Conference on Big Data.

[35]  Gene H. Golub,et al.  Matrix computations , 1983 .

[36]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[37]  Andrew Gelman,et al.  Sampling for Bayesian Computation with Large Datasets , 2005 .

[38]  Tom A. B. Snijders,et al.  Multilevel Analysis , 2011, International Encyclopedia of Statistical Science.

[39]  Yehuda Koren,et al.  Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy , 2011, RecSys '11.

[40]  Gagandeep Kang,et al.  Analysis of human immune responses in quasi-experimental settings: tutorial in biostatistics , 2012, BMC Medical Research Methodology.

[41]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[42]  J. Gill Hierarchical Linear Models , 2005 .

[43]  Mark C. K. Yang,et al.  Large sample inference in random coefficient regression models , 1986 .

[44]  P. Swamy Efficient Inference in a Random Coefficient Regression Model , 1970 .

[45]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[46]  William G. Cochran,et al.  The analysis of groups of experiments , 1938, The Journal of Agricultural Science.

[47]  Robert E. Ployhart,et al.  Hierarchical Linear Models , 2014 .

[48]  J. Nelder,et al.  Double hierarchical generalized linear models , 2006 .

[49]  H. Robbins A Stochastic Approximation Method , 1951 .

[50]  John E. Moody,et al.  Note on Learning Rate Schedules for Stochastic Optimization , 1990, NIPS.

[51]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[52]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[53]  David Firth,et al.  Bias reduction in exponential family nonlinear models , 2009 .

[54]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.