Abstract In this article, I show how to fit a generalized linear model to N observations on p variables stored in a relational database, using one sampling query and one aggregation query, as long as observations can be stored in memory, for some . The resulting estimator is fully efficient and asymptotically equivalent to the maximum likelihood estimator, and so its variance can be estimated from the Fisher information in the usual way. A proof-of-concept implementation uses R with MonetDB and with SQLite, and could easily be adapted to other popular databases. I illustrate the approach with examples of taxi-trip data in New York City and factors related to car color in New Zealand. Supplementary materials for this article are available online.
[1]
D. G. Simpson,et al.
On One-Step GM Estimates and Stability of Inferences in Linear Regression
,
1992
.
[2]
R. Tibshirani,et al.
Generalized additive models for medical research
,
1986,
Statistical methods in medical research.
[3]
C. Wild,et al.
Vector Generalized Additive Models
,
1996
.
[4]
R Core Team,et al.
R: A language and environment for statistical computing.
,
2014
.
[5]
Diane M. Griffiths,et al.
THE REGENTS OF THE UNIVERSITY OF CALIFORNIA
,
2007
.
[6]
How Many Iterations are Sufficient for Efficient Semiparametric Estimation?
,
2013
.