Model-based inference for k-nearest neighbours predictions using a canonical vine copula

Abstract The k-near neighbours (k-NN) technique combines field data from forest inventories and auxiliary information for forest resource estimation at various geographical scales. In this study, auxiliary data consisting of Landsat 5 TM satellite imagery and terrain elevations were used to perform k-NN imputations of plot-level above ground biomass. Following the model-based inference, a superpopulation model consisting of a canonical vine copula was constructed from the empirical data, and new samples were generated from the model and used for k-NN predictions. The method used herein allows constructing the sampling distribution for the imputation errors and for assessing the statistical properties of the k-NN estimator. Using a data-splitting procedure, the copula-based approach was assessed against pair-bootstrap resampling. The imputations were performed using k (the number of neighbours) = 1 and by using optimal k values selected according to a bias-minimizing criterion. The empirical coverage probabilities of the confidence intervals constructed using the copula-based approach were closer to the nominal coverages. The improvements were due to significant bias reduction, while the standard errors were higher compared to the bootstrap. Still, the root mean squared error was significantly reduced. The best results were obtained using the copula approach and k-NN imputations with k=1.

[1]  Bill Ravens,et al.  An Introduction to Copulas , 2000, Technometrics.

[2]  Steen Magnussen,et al.  A resampling variance estimator for the k nearest neighbours technique , 2010 .

[3]  R. McRoberts,et al.  Remote sensing support for national forest inventories , 2007 .

[4]  Claudia Czado,et al.  Maximum likelihood estimation of mixed C-vines with application to exchange rates , 2012 .

[5]  Hailemariam Temesgen,et al.  Forest Measurement and Biometrics in Forest Management: Status and Future Needs of the Pacific Northwest USA , 2007 .

[6]  Gherardo Chirici,et al.  Parametric, bootstrap, and jackknife variance estimators for the k-Nearest Neighbors technique with illustrations using forest inventory and satellite image data , 2011 .

[7]  Jaakko Heinonen,et al.  Stochastic Simulation of Forest Regeneration Establishment Using a Multilevel Multivariate Model , 2008 .

[8]  Nicholas L. Crookston,et al.  Partitioning error components for accuracy-assessment of near-neighbor methods of imputation , 2007 .

[9]  Jun Shao,et al.  Jackknife Variance Estimation for Nearest-Neighbor Imputation , 2001 .

[10]  Göran Ståhl,et al.  Assessing the accuracy of regional LiDAR-based biomass estimation using a simulation approach , 2012 .

[11]  J. Heikkinen,et al.  Estimating areal means and variances of forest attributes using the k-Nearest Neighbors technique and satellite imagery , 2007 .

[12]  Nicholas C. Coops,et al.  Estimating stand structural details using nearest neighbor analyses to link ground data, forest cover maps, and Landsat imagery , 2008 .

[13]  L. Holmström,et al.  Smoothing methodology for predicting regional averages in multi-source forest inventory , 2008 .

[14]  P. Chavez Image-Based Atmospheric Corrections - Revisited and Improved , 1996 .

[15]  Ronald E. McRoberts,et al.  Stratified estimation of forest area using satellite imagery, inventory data, and the k-Nearest Neighbors technique , 2002 .

[16]  Partha Lahiri,et al.  On the Impact of Bootstrap in Survey Sampling and Small-Area Estimation , 2003 .

[17]  E. Tomppo,et al.  Selecting estimation parameters for the Finnish multisource National Forest Inventory , 2001 .

[18]  Brian D. Ripley,et al.  Modern applied statistics with S, 4th Edition , 2002, Statistics and computing.

[19]  C. Genest,et al.  Everything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask , 2007 .

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[22]  T. Gregoire,et al.  Sampling Strategies for Natural Resources and the Environment , 2004 .

[23]  A. Hudak,et al.  Nearest neighbor imputation of species-level, plot-scale forest structure attributes from LiDAR data , 2008 .

[24]  S. Magnussen,et al.  Sampling Methods, Remote Sensing and GIS Multiresource Forest Inventory , 2006 .

[25]  S. Magnussen,et al.  A model-assisted k-nearest neighbour approach to remove extrapolation bias , 2010 .

[26]  M. Nilsson,et al.  Combining national forest inventory field plots and remote sensing data for forest databases , 2008 .

[27]  Anoop Upadhyay,et al.  Trivariate distribution modeling of tree diameter, height, and volume. , 2010 .

[28]  Erkki Tomppo,et al.  Model-based prediction error uncertainty estimation for k-nn method , 2006 .

[29]  Keith Rennolls,et al.  Bivariate Distribution Modeling of Tree Diameters and Heights: Dependency Modeling Using Copulas , 2008 .

[30]  M. Smith Bayesian Approaches to Copula Modelling , 2011 .

[31]  Erkki Tomppo,et al.  Using coarse scale forest variables as ancillary information and weighting of variables in k-NN estimation: a genetic algorithm approach , 2004 .

[32]  A. Frigessi,et al.  Pair-copula constructions of multiple dependence , 2009 .

[33]  Jun Yan,et al.  Modeling Multivariate Distributions with Continuous Margins Using the copula R Package , 2010 .

[34]  M. Maltamo,et al.  Combining ALS and NFI training data for forest management planning: a case study in Kuortane, Western Finland , 2009, European Journal of Forest Research.

[35]  L. Marklund,et al.  Biomass functions for pine, spruce and birch in Sweden , 1988 .

[36]  M. Bauer,et al.  Estimation and mapping of forest stand density, volume, and cover type using the k-nearest neighbors method , 2001 .

[37]  M. Smith Bayesian Approaches to Copula Modelling , 2011, 1112.4204.

[38]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[39]  P. Embrechts,et al.  Risk Management: Correlation and Dependence in Risk Management: Properties and Pitfalls , 2002 .

[40]  M. H. Quenouille Approximate Tests of Correlation in Time‐Series , 1949 .

[41]  Ronald E. McRoberts,et al.  Predicting categorical forest variables using an improved k-Nearest Neighbour estimator and Landsat imagery , 2009 .

[42]  J. Fox Bootstrapping Regression Models , 2002 .

[43]  Hailemariam Temesgen,et al.  Comparison of Nearest Neighbor Methods for Estimating Basal Area and Stems per Hectare Using Aerial Auxiliary Variables , 2005, Forest Science.

[44]  M. Nilsson Estimation of forest variables using satellite image data and airborne Lidar , 1997 .

[45]  Jukka Malinen Locally Adaptable Non-parametric Methods for Estimating Stand Characteristics for Wood Procurement Planning , 2003 .

[46]  Claudia Czado,et al.  Pair Copula Constructions for Multivariate Discrete Data , 2012 .

[47]  Piermaria Corona,et al.  Non-parametric and parametric methods using satellite images for estimating growing stock volume in alpine and Mediterranean forest ecosystems , 2008 .

[48]  Pair Copula Constructions for Discrete Data , 2011 .

[49]  Ronald E. McRoberts,et al.  Estimating forest attribute parameters for small areas using nearest neighbors techniques , 2012 .

[50]  C. Genest,et al.  A Primer on Copulas for Count Data , 2007, ASTIN Bulletin.

[51]  M. Nilsson,et al.  Countrywide Estimates of Forest Variables Using Satellite Data and Field Data from the National Forest Inventory , 2003, Ambio.

[52]  Göran Ståhl,et al.  Estimating biomass in Hedmark County, Norway using national forest inventory field plots and airborne laser scanning , 2012 .

[53]  Erik Næsset,et al.  Using remotely sensed data to construct and assess forest attribute maps and related spatial products , 2010 .

[54]  Ronald E. McRoberts,et al.  Diagnostic tools for nearest neighbors techniques when used with satellite imagery , 2009 .

[55]  R. McRoberts Using satellite imagery and the k-nearest neighbors technique as a bridge between strategic and management forest inventories , 2008 .

[56]  P. Corona Integration of forest mapping and inventory to support forest management , 2010 .

[57]  Nicholas L. Crookston,et al.  The roles of nearest neighbor methods in imputing missing data in forest inventory and monitoring databases , 2009 .

[58]  Annika Kangas,et al.  Comparison of k-MSN and kriging in local prediction , 2012 .

[59]  T. Bedford,et al.  Vines: A new graphical model for dependent random variables , 2002 .

[60]  R. Chambers,et al.  Model-Based Inference† , 2006 .

[61]  Ronald E. McRoberts,et al.  Probability- and model-based approaches to inference for proportion forest using satellite imagery as ancillary data , 2010 .

[62]  Erik Næsset,et al.  Advances and emerging issues in national forest inventories , 2010 .

[63]  P. Embrechts,et al.  Correlation and Dependency in Risk Management , 2002 .

[64]  H. Gerber,et al.  Editorial to the special issue on modeling and measurement of multivariate risk in insurance and finance , 2009 .

[65]  John A. Kershaw,et al.  Original paper: Spatially correlated forest stand structures: A simulation approach using copulas , 2010 .

[66]  T. Gregoire Design-based and model-based inference in survey sampling: appreciating the difference , 1998 .

[67]  Piermaria Corona,et al.  Design-based approach to k-nearest neighbours technique for coupling field and remotely sensed data in forest surveys , 2009 .

[68]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[69]  Hyunshik Lee,et al.  ESTIMATION OF THE VARIANCE IN THE PRESENCE OF NEAREST NEIGHBOUR IMPUTATION , 2002 .

[70]  Dimitris Karlis,et al.  Modeling Multivariate Count Data Using Copulas , 2009, Commun. Stat. Simul. Comput..

[71]  S. Magnussen,et al.  Model-based mean square error estimators for k-nearest neighbour predictions and applications using remotely sensed data for forest inventories , 2009 .

[72]  Matti Katila,et al.  Empirical errors of small area estimates from the multisource National Forest Inventory in Eastern Finland , 2006 .

[73]  Piermaria Corona,et al.  Design-based diagnostics for k-NN estimators of forest resourcesThis article is one of a selection of papers from Extending Forest Inventory and Monitoring over Space and Time. , 2011 .

[74]  Annika Kangas,et al.  Application of nearest-neighbour regression for generalizing sample tree information , 1997 .

[75]  Nonparametric variance estimation for nearest neighbor imputation , 2009 .

[76]  C. Goulding,et al.  Estimation of timber volume in a coniferous plantation forest using Landsat TM , 1997 .

[77]  Roger M. Cooke,et al.  Probability Density Decomposition for Conditionally Dependent Random Variables Modeled by Vines , 2001, Annals of Mathematics and Artificial Intelligence.

[78]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[79]  R. McRoberts A model-based approach to estimating forest area , 2006 .

[80]  E. Tomppo,et al.  Satellite image-based national forest inventory of finland for publication in the igarss'91 digest , 1991, [Proceedings] IGARSS'91 Remote Sensing: Global Monitoring for Earth Management.