Model-Assisted Survey Regression Estimation with the Lasso

In the U.S. Forest Service’s Forest Inventory and Analysis (FIA) program, as in other natural resource surveys, many auxiliary variables are available for use in model-assisted inference about finite population parameters. Some of this auxiliary information may be extraneous, and therefore model selection is appropriate to improve the efficiency of the survey regression estimators of finite population totals. A model-assisted survey regression estimator using the lasso is presented and extended to the adaptive lasso. For a sequence of finite populations and probability sampling designs, asymptotic properties of the lasso survey regression estimator are derived, including design consistency and central limit theory for the estimator and design consistency of a variance estimator. To estimate multiple finite population quantities with the method, lasso survey regression weights are developed, using both a model calibration approach and a ridge regression approximation. The gains in efficiency of the lasso estimator over the full regression estimator are demonstrated through a simulation study estimating tree canopy cover for a region in Utah.

[1]  Survey design asymptotics for the model-assisted penalised spline regression estimator , 2013 .

[2]  Giorgio E. Montanari,et al.  Nonparametric Model Calibration Estimation in Survey Sampling , 2005 .

[3]  C. Cassel,et al.  Some results on generalized difference estimation and generalized regression estimation for finite populations , 1976 .

[4]  J. Beaumont,et al.  Another look at ridge calibration , 2008 .

[5]  Thomas C.M. Lee,et al.  Improved estimation for complex surveys using modern regression techniques , 2011 .

[6]  C. Goga Réduction de la variance dans les sondages en présence d'information auxiliarie: Une approache non paramétrique par splines de régression , 2005 .

[7]  P. Bardsley,et al.  Multipurpose Estimation from Unbalanced Samples , 1984 .

[8]  Robert Chambers,et al.  Robust case-weighting for multipurpose establishment surveys. , 1996 .

[9]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[10]  Chris J. Skinner,et al.  Variable selection for regression estimation in finite populations , 1997 .

[11]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[12]  P. Robinson,et al.  Asymptotic properties of the generalized regression estimator in probability sampling , 2016 .

[13]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[14]  Warren B. Cohen,et al.  Modeling Percent Tree Canopy Cover: A Pilot Study , 2012 .

[15]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[16]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[17]  F. Breidt,et al.  Model-Assisted Estimation of Forest Resources With Generalized Additive Models , 2007 .

[18]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[19]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[20]  Carl-Erik Särndal,et al.  The weighted residual technique for estimating the variance of the general regression estimator of the finite population total , 1989 .

[21]  C. Särndal,et al.  Calibration Estimators in Survey Sampling , 1992 .

[22]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  D. Freedman,et al.  Asymptotic Normality and the Bootstrap in Stratified Sampling , 1984 .

[25]  F. Breidt,et al.  Local polynomial regresssion estimators in survey sampling , 2000 .

[26]  F. Breidt,et al.  Model-Assisted Estimation for Complex Surveys Using Penalized Splines , 2005 .

[27]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[28]  F. Breidt,et al.  Nonparametric and Semiparametric Estimation in Complex Surveys , 2009 .

[29]  J. Rao,et al.  Inference From Stratified Samples: Properties of the Linearization, Jackknife and Balanced Repeated Replication Methods , 1981 .

[30]  Changbao Wu,et al.  A Model-Calibration Approach to Using Complete Auxiliary Information From Survey Data , 2001 .

[31]  A. Singh,et al.  A RIDGE-SHRINKAGE METHOD FOR RANGE-RESTRICTED WEIGHT CALIBRATION IN SURVEY SAMPLING , 2002 .

[32]  Li Wang,et al.  Nonparametric additive model-assisted estimation for survey data , 2011, J. Multivar. Anal..

[33]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[34]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[35]  C. T. Isaki,et al.  Survey Design under the Regression Superpopulation Model , 1982 .

[36]  G. Montanari,et al.  Nonparametric Methods in Survey Sampling , 2005 .

[37]  Limin Yang,et al.  Development of a 2001 National land-cover database for the United States , 2004 .