In many applications involving large dataset or online updating, stochastic gradient descent (SGD) provides a scalable way to compute parameter estimates and has gained increasing popularity due to its numerical convenience and memory efficiency. While the asymptotic properties of SGD-based estimators have been established decades ago, statistical inference such as interval estimation remains much unexplored. The traditional resampling method such as the bootstrap is not computationally feasible since it requires to repeatedly draw independent samples from the entire dataset. The plug-in method is not applicable when there are no explicit formulas for the covariance matrix of the estimator. In this paper, we propose a scalable inferential procedure for stochastic gradient descent, which, upon the arrival of each observation, updates the SGD estimate as well as a large number of randomly perturbed SGD estimates. The proposed method is easy to implement in practice. We establish its theoretical properties for a general class of models that includes generalized linear models and quantile regression models as special cases. The finite-sample performance and numerical utility is evaluated by simulation studies and two real data applications.
[1]
Eric R. Ziegel,et al.
Generalized Linear Models
,
2002,
Technometrics.
[2]
H. Robbins.
A Stochastic Approximation Method
,
1951
.
[3]
Ming-Hui Chen,et al.
Statistical methods and computing for big data.
,
2015,
Statistics and its interface.
[4]
E. Airoldi,et al.
Asymptotic and finite-sample properties of estimators based on stochastic gradients
,
2014
.
[5]
Boris Polyak,et al.
Acceleration of stochastic approximation by averaging
,
1992
.
[6]
Raul Cano.
On The Bayesian Bootstrap
,
1992
.
[7]
Yang Yaning.
APPROXIMATING THE DISTRIBUTION OF M-ESTIMATORS IN LINEAR MODELS BY RANDOMLY WEIGHTED BOOTSTRAP
,
2008
.
[8]
Xin T. Tong,et al.
Statistical inference for model parameters in stochastic gradient descent
,
2016,
The Annals of Statistics.
[9]
D. Ruppert,et al.
Efficient Estimations from a Slowly Convergent Robbins-Monro Process
,
1988
.