论文信息 - Robust Distributional Regression with Automatic Variable Selection

Robust Distributional Regression with Automatic Variable Selection

Datasets with extreme observations and/or heavy-tailed error distributions are commonly encountered and should be analyzed with careful consideration of these features from a statistical perspective. Small deviations from an assumed model, such as the presence of outliers, can cause classical regression procedures to break down, potentially leading to unreliable inferences. Other distributional features, such as heteroscedasticity, can be handled by going beyond the mean and modelling the scale parameter in terms of covariates. We propose a method that accounts for heavy tails and heteroscedasticity through the use of a generalized normal distribution (GND). The GND contains a kurtosis-characterizing shape parameter that moves the model smoothly between the normal distribution and the heavier-tailed Laplace distribution — thus covering both classical and robust regression. A key component of statistical modelling is determining the set of covariates that inﬂuence the response variable. While correctly accounting for kurtosis and heteroscedasticity is crucial to this endeavour, a procedure for variable selection is still required. For this purpose, we use a novel penalized estimation procedure that avoids the typical computationally demanding grid search for tuning parameters. This is particularly valuable in the distributional regression setting where the location and scale parameters depend on covariates, since the standard approach would have multiple tuning parameters (one for each distributional parameter). We achieve this by using a “smooth information criterion” that can be optimized directly, where the tuning parameters are ﬁxed at log( n ) in the BIC case.

Meadhbh O'Neill | Kevin Burke

[1] K. Burke,et al. Variable selection using a smooth information criterion for distributional regression models , 2021, Statistics and Computing.

[2] T. Kneib,et al. Rage Against the Mean – A Review of Distributional Regression Approaches , 2021, Econometrics and Statistics.

[3] Thomas Kneib,et al. Interactively visualizing distributional regression models with distreg.vis , 2021, Statistical Modelling.

[4] Leonid Hanin,et al. Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings , 2021, Mathematics.

[5] Klaus Nordhausen,et al. Robust linear regression for high‐dimensional data: An overview , 2020, WIREs Computational Statistics.

[6] Xiaoming Yuan,et al. The flare package for high dimensional linear regression and precision matrix estimation in R , 2020, J. Mach. Learn. Res..

[7] Elvezio Ronchetti,et al. Accurate and robust inference , 2020 .

[8] Yunlu Jiang,et al. Outlier detection and robust variable selection via the penalized weighted LAD-LASSO method , 2020, Journal of applied statistics.

[9] Umberto Amato,et al. Penalised robust estimators for sparse and high-dimensional linear models , 2020, Statistical Methods & Applications.

[10] julien Hambuckers,et al. LASSO-type penalization in the framework of generalized additive models for location, scale and shape , 2019, Comput. Stat. Data Anal..

[11] M. C. Jones,et al. A flexible parametric modelling framework for survival analysis , 2019, Journal of the Royal Statistical Society: Series C (Applied Statistics).