论文信息 - Fast Generalized Linear Models by Database Sampling and One-Step Polishing

Fast Generalized Linear Models by Database Sampling and One-Step Polishing

Abstract In this article, I show how to fit a generalized linear model to N observations on p variables stored in a relational database, using one sampling query and one aggregation query, as long as observations can be stored in memory, for some . The resulting estimator is fully efficient and asymptotically equivalent to the maximum likelihood estimator, and so its variance can be estimated from the Fisher information in the usual way. A proof-of-concept implementation uses R with MonetDB and with SQLite, and could easily be adapted to other popular databases. I illustrate the approach with examples of taxi-trip data in New York City and factors related to car color in New Zealand. Supplementary materials for this article are available online.

Thomas Lumley | T. Lumley

[1] D. G. Simpson,et al. On One-Step GM Estimates and Stability of Inferences in Linear Regression , 1992 .

[2] R. Tibshirani,et al. Generalized additive models for medical research , 1986, Statistical methods in medical research.

[3] C. Wild,et al. Vector Generalized Additive Models , 1996 .

[4] R Core Team,et al. R: A language and environment for statistical computing. , 2014 .

[5] Diane M. Griffiths,et al. THE REGENTS OF THE UNIVERSITY OF CALIFORNIA , 2007 .

[6] How Many Iterations are Sufficient for Efficient Semiparametric Estimation? , 2013 .