Endogenous post-stratification in surveys: Classifying with a sample-fitted model

Post-stratification is frequently used to improve the precision of survey estimators when categorical auxiliary information is available from sources outside the survey. In natural resource surveys, such information is often obtained from remote sensing data, classified into categories and displayed as pixel-based maps. These maps may be constructed based on classification models fitted to the sample data. Post-stratification of the sample data based on categories derived from the sample data ("endogenous post-stratification") violates the standard post-stratification assumptions that observations are classified without error into post-strata, and post-stratum population counts are known. Properties of the endogenous post-stratification estimator are derived for the case of a sample-fitted generalized linear model, from which the post-strata are constructed by dividing the range of the model predictions into predetermined intervals. Design consistency of the endogenous post-stratification estimator is established under mild conditions. Under a super-population model, consistency and asymptotic normality of the endogenous post-stratification estimator are established, showing that it has the same asymptotic variance as the traditional post-stratified estimator with fixed strata. Simulation experiments demonstrate that the practical effect of first fitting a model to the survey data before post-stratifying is small, even for relatively small sample sizes.

[1]  H. Tucker A Generalization of the Glivenko-Cantelli Theorem , 1959 .

[2]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[3]  T. Postelnicu,et al.  Foundations of inference in survey sampling , 1977 .

[4]  W. Hargrove,et al.  Photogrammetric Engineering & Remote Sensing , 2022 .

[5]  Ronald H. Randles,et al.  On the Asymptotic Normality of Statistics with Estimated Parameters , 1982 .

[6]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[7]  F. Breidt,et al.  Model-Assisted Estimation for Complex Surveys Using Penalized Splines , 2005 .

[8]  A. Winsor Sampling techniques. , 2000, Nursing times.

[9]  William G. Cochran,et al.  Sampling Techniques, 3rd Edition , 1963 .

[10]  F. Breidt,et al.  Local polynomial regresssion estimators in survey sampling , 2000 .

[11]  J. Blackard,et al.  Forest type mapping of the Interior West , 2004 .

[12]  Mark H. Hansen,et al.  Sample-based estimators used by the forest inventory and analysis national information management system , 2005 .

[13]  John A. Nelder,et al.  Generalized linear models. 2nd ed. , 1993 .

[14]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[15]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[16]  George M. Furnival,et al.  Forest Survey Sampling Designs: A History , 1999, Journal of Forestry.

[17]  Gretchen G. Moisen,et al.  Comparing five modelling techniques for predicting forest characteristics , 2002 .

[18]  Changbao Wu,et al.  A Model-Calibration Approach to Using Complete Auxiliary Information From Survey Data , 2001 .