Nonparametric inference via bootstrapping the debiased estimator

In this paper, we propose to construct confidence bands by bootstrapping the debiased kernel density estimator (for density estimation) and the debiased local polynomial regression estimator (for regression analysis). The idea of using a debiased estimator was recently employed by Calonico et al. (2018b) to construct a confidence interval of the density function (and regression function) at a given point by explicitly estimating stochastic variations. We extend their ideas of using the debiased estimator and further propose a bootstrap approach for constructing simultaneous confidence bands. This modified method has an advantage that we can easily choose the smoothing bandwidth from conventional bandwidth selectors and the confidence band will be asymptotically valid. We prove the validity of the bootstrap confidence band and generalize it to density level sets and inverse regression problems. Simulation studies confirm the validity of the proposed confidence bands/sets. We apply our approach to an Astronomy dataset to show its applicability

[1]  M. Wand,et al.  ASYMPTOTICS FOR GENERAL MULTIVARIATE KERNEL DENSITY DERIVATIVE ESTIMATORS , 2011 .

[2]  N. Bissantz,et al.  Confidence bands for inverse regression models , 2010 .

[3]  Y. Wadadekar,et al.  Submitted to ApJS Preprint typeset using L ATEX style emulateapj v. 10/09/06 THE SIXTH DATA RELEASE OF THE SLOAN DIGITAL SKY SURVEY , 2022 .

[4]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[5]  Rebecca Nugent,et al.  Stability of density-based clustering , 2010, J. Mach. Learn. Res..

[6]  Nicolai Bissantz,et al.  Asymptotic normality and confidence intervals for inverse regression models with convolution-type operators , 2009, J. Multivar. Anal..

[7]  M. Hazelton,et al.  Cross‐validation Bandwidth Matrices for Multivariate Kernel Density Estimation , 2005 .

[8]  L. Wasserman All of Nonparametric Statistics , 2005 .

[9]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[10]  Kengo Kato,et al.  Central limit theorems and bootstrap in high dimensions , 2014, 1412.3661.

[11]  W. M. Wood-Vasey,et al.  SDSS-III: MASSIVE SPECTROSCOPIC SURVEYS OF THE DISTANT UNIVERSE, THE MILKY WAY, AND EXTRA-SOLAR PLANETARY SYSTEMS , 2011, 1101.1529.

[12]  Yen-Chi Chen,et al.  A tutorial on kernel density estimation and recent advances , 2017, 1704.03924.

[13]  D. W. Scott,et al.  Cross-Validation of Multivariate Densities , 1994 .

[14]  Victor Chernozhukov,et al.  Anti-concentration and honest, adaptive confidence bands , 2013 .

[15]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[16]  Jeffrey S. Racine,et al.  CROSS-VALIDATED LOCAL LINEAR NONPARAMETRIC REGRESSION , 2004 .

[17]  Uwe Einmahl,et al.  Uniform in bandwidth consistency of kernel-type function estimators , 2005 .

[18]  Harold D. Chiang,et al.  A Unified Robust Bootstrap Method for Sharp/Fuzzy Mean/Quantile Regression Discontinuity/Kink Designs , 2017 .

[19]  Sivaraman Balakrishnan,et al.  Confidence sets for persistence diagrams , 2013, The Annals of Statistics.

[20]  Yen-Chi Chen,et al.  Density Level Sets: Asymptotics, Inference, and Visualization , 2015, 1504.05438.

[21]  Tarn Duong,et al.  Local significant differences from nonparametric two-sample tests , 2013 .

[22]  Jörg Polzehl,et al.  Simultaneous bootstrap confidence bands in nonparametric regression , 1998 .

[23]  Yingcun Xia,et al.  Asymptotic Behavior of Bandwidth Selected by the Cross-Validation Method for Local Polynomial Fitting , 2002 .

[24]  Max H. Farrell,et al.  Coverage Error Optimal Confidence Intervals , 2018 .

[25]  V. Cardone,et al.  Colour and stellar population gradients in galaxies: correlation with mass , 2010, Monthly Notices of the Royal Astronomical Society.

[26]  Christopher R. Genovese,et al.  Asymptotic theory for density ridges , 2014, 1406.5663.

[27]  Wenceslao González-Manteiga,et al.  PLUG‐IN ESTIMATION OF GENERAL LEVEL SETS , 2006 .

[28]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[29]  J. Brinkmann,et al.  New York University Value-Added Galaxy Catalog: A Galaxy Catalog Based on New Public Surveys , 2005 .

[30]  Jianqing Fan Local Linear Regression Smoothers and Their Minimax Efficiencies , 1993 .

[31]  Tarn Duong,et al.  ks: Kernel Density Estimation and Kernel Discriminant Analysis for Multivariate Data in R , 2007 .

[32]  Michael H. Neumann Automatic bandwidth choice and confidence intervals in nonparametric regression , 1995 .

[33]  Frédéric Chazal,et al.  Stochastic Convergence of Persistence Landscapes and Silhouettes , 2013, J. Comput. Geom..

[34]  Kjell A. Doksum,et al.  Uniform Confidence Bounds for Regression Based on a Simple Moving Average , 1985 .

[35]  P. Hall On Bootstrap Confidence Intervals in Nonparametric Regression , 1992 .

[36]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[37]  S. Weisberg Applied Linear Regression , 1981 .

[38]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2012, 1212.6906.

[39]  Robert P. Lieli,et al.  Estimating Conditional Average Treatment Effects , 2014 .

[40]  Yingcun Xia,et al.  UNIFORM BAHADUR REPRESENTATION FOR LOCAL POLYNOMIAL ESTIMATES OF M-REGRESSION AND ITS APPLICATION TO THE ADDITIVE MODEL , 2007, Econometric Theory.

[41]  R. R. Bahadur A Note on Quantiles in Large Samples , 1966 .

[42]  S. Panchapakesan,et al.  Measurement, Regression, and Calibration (Philip J. Brown) , 1995, SIAM Rev..

[43]  D. York,et al.  The Overdensities of Galaxy Environments as a Function of Luminosity and Color , 2002, astro-ph/0212085.

[44]  L. Wasserman,et al.  On the path density of a gradient field , 2008, 0805.4141.

[45]  S. Sheather Density Estimation , 2004 .

[46]  Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings , 2016 .

[47]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[48]  W. Härdle,et al.  Bootstrapping in Nonparametric Regression: Local Adaptive Smoothing and Confidence Bands , 1988 .

[49]  G. Michailidis,et al.  A two-stage hybrid procedure for estimating an inverse regression function , 2011, 1105.3018.

[50]  L. Wasserman Topological Data Analysis , 2016, 1609.08227.

[51]  Yingcun Xia,et al.  Bias‐corrected confidence bands in nonparametric regression , 1998 .

[52]  Wolfgang Härdle,et al.  BOOTSTRAP INFERENCE IN SEMIPARAMETRIC GENERALIZED ADDITIVE MODELS , 2004, Econometric Theory.

[53]  D. Freedman Bootstrapping Regression Models , 1981 .

[54]  Inge Koch,et al.  Highest Density Difference Region Estimation with Application to Flow Cytometric Data , 2009, Biometrical journal. Biometrische Zeitschrift.

[55]  Yu-Chin Hsu,et al.  (Preliminary: please do not cite or quote without permission.) , 2022 .

[56]  E. al.,et al.  The Sloan Digital Sky Survey: Technical summary , 2000, astro-ph/0006396.

[57]  Changbao Wu,et al.  Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis , 1986 .

[58]  L. Gleser Measurement, Regression, and Calibration , 1996 .

[59]  Sokbae Lee,et al.  Nonparametric Tests of Conditional Treatment Effects , 2009 .

[60]  A. Tsybakov On nonparametric estimation of density level sets , 1997 .

[61]  Wolfgang Härdle,et al.  Better Bootstrap Confidence Intervals for Regression Curve Estimation , 1995 .

[62]  Joseph P. Romano Bootstrapping the mode , 1988 .

[63]  Bootstrapping Regression Models 21.1 Bootstrapping Basics , .

[64]  R. Servien,et al.  Nonparametric estimation of regression level sets , 2011 .

[65]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[67]  Peter Hall,et al.  A simple bootstrap method for constructing nonparametric confidence bands for functions , 2013, 1309.4864.

[68]  W. Polonik Measuring Mass Concentrations and Estimating Density Contour Clusters-An Excess Mass Approach , 1995 .

[69]  Marie-Anne Gruet,et al.  A nonparametric calibration analysis , 1996 .

[70]  Song Xi Chen,et al.  Empirical likelihood confidence intervals for nonparametric density estimation , 1996 .

[71]  P. Hall Large Sample Optimality of Least Squares Cross-Validation in Density Estimation , 1983 .

[72]  Yen-Chi Chen,et al.  Generalized cluster trees and singular measures , 2016, The Annals of Statistics.

[73]  Franco Magno,et al.  A statistical overview on univariate calibration, inverse regression, and detection limits: Application to gas chromatography/mass spectrometry technique. , 2007, Mass spectrometry reviews.

[74]  Xiao-Hua Zhou,et al.  Treatment selection in a randomized clinical trial via covariate-specific treatment effect curves , 2017, Statistical methods in medical research.

[75]  C. Conselice,et al.  How does galaxy environment matter? The relationship between galaxy environments, colour and stellar mass at 0.4 < z < 1 in the Palomar/DEEP2 survey , 2010, 1009.3189.

[76]  James Stephen Marron,et al.  BOOTSTRAP SIMULTANEOUS ERROR BARS FOR NONPARAMETRIC REGRESSION , 1991 .

[77]  P. Hall EFFECT OF BIAS ESTIMATION ON COVERAGE ACCURACY OF BOOTSTRAP CONFIDENCE INTERVALS FOR A PROBABILITY DENSITY , 1992 .

[78]  E. Giné,et al.  Rates of strong uniform consistency for multivariate kernel density estimators , 2002 .

[79]  Sivaraman Balakrishnan,et al.  Statistical Inference for Cluster Trees , 2016, NIPS.

[80]  Kengo Kato,et al.  Gaussian approximation of suprema of empirical processes , 2014 .

[81]  Zeljko Ivezic,et al.  The Environment of Galaxies at Low Redshift , 2008, 0801.0312.

[82]  Kengo Kato,et al.  Comparison and anti-concentration bounds for maxima of Gaussian random vectors , 2013, 1301.4807.

[83]  C. Loader,et al.  Simultaneous Confidence Bands for Linear Regression and Smoothing , 1994 .

[84]  Enno Mammen,et al.  Confidence regions for level sets , 2013, J. Multivar. Anal..

[85]  Larry A. Wasserman,et al.  Nonparametric Ridge Estimation , 2012, ArXiv.

[86]  B. Cadre Kernel estimation of density level sets , 2005, math/0501221.

[87]  S. Roweis,et al.  An Improved Photometric Calibration of the Sloan Digital Sky Survey Imaging Data , 2007, astro-ph/0703454.

[88]  Max H. Farrell,et al.  On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference , 2015, Journal of the American Statistical Association.

[89]  M. Wand,et al.  EXACT MEAN INTEGRATED SQUARED ERROR , 1992 .

[90]  Scott M. Berry,et al.  Bayesian Smoothing and Regression Splines for Measurement Error Problems , 2002 .

[91]  C. Genovese,et al.  Detecting Effects of Filaments on Galaxy Properties in the Sloan Digital Sky Survey III , 2015, 1509.06376.

[92]  E. Nadaraya On Estimating Regression , 1964 .

[93]  Herbert Edelsbrunner,et al.  Persistent Homology: Theory and Practice , 2013 .

[94]  Yu-Chin Hsu,et al.  Robust uniform inference for quantile treatment effects in regression discontinuity designs , 2017, Journal of Econometrics.

[95]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[96]  Paul L. Speckman,et al.  Confidence bands in nonparametric regression , 1993 .

[97]  Christopher R. Genovese,et al.  Cosmic web reconstruction through density ridges: method and algorithm , 2015, 1501.05303.