Online Statistical Inference for Parameters Estimation with Linear-Equality Constraints

Stochastic gradient descent (SGD) and projected stochastic gradient descent (PSGD) are scalable algorithms to compute model parameters in unconstrained and constrained optimization problems. In comparison with stochastic gradient descent (SGD), PSGD forces its iterative values into the constrained parameter space via projection. The convergence rate of PSGD-type estimates has been exhaustedly studied, while statistical properties such as asymptotic distribution remain less explored. From a purely statistical point of view, this paper studies the limiting distribution of PSGD-based estimate when the true parameters satisfying some linear-equality constraints. Our theoretical findings reveal the role of projection played in the uncertainty of the PSGD estimate. As a byproduct, we propose an online hypothesis testing procedure to test the linearequality constraints. Simulation studies on synthetic data and an application to a real-world dataset confirm our theory.

[1]  Peter Richtárik,et al.  SGD: General Analysis and Improved Rates , 2019, ICML 2019.

[2]  Lei Yang,et al.  Online Bootstrap Confidence Intervals for the Stochastic Gradient Descent Estimator , 2018, J. Mach. Learn. Res..

[3]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[4]  An averaged projected Robbins-Monro algorithm for estimating the parameters of a truncated spherical distribution , 2016, 1606.04276.

[5]  Yuancheng Zhu,et al.  Uncertainty Quantification for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent , 2018, 1802.04876.

[6]  Anastasios Kyrillidis,et al.  Statistical inference using SGD , 2017, AAAI.

[7]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[8]  Xin T. Tong,et al.  Statistical inference for model parameters in stochastic gradient descent , 2016, The Annals of Statistics.

[9]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[10]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[11]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[12]  Alfred O. Hero,et al.  Lower bounds for parametric estimation with constraints , 1990, IEEE Trans. Inf. Theory.

[13]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[14]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[15]  Brian M. Sadler,et al.  Maximum-Likelihood Estimation, the CramÉr–Rao Bound, and the Method of Scoring With Parameter Constraints , 2008, IEEE Transactions on Signal Processing.

[16]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[17]  H. Robbins A Stochastic Approximation Method , 1951 .