LASSO-type penalization in the framework of generalized additive models for location, scale and shape

For numerous applications it is of interest to provide full probabilistic forecasts, which are able to assign probabilities to each predicted outcome. Therefore, attention is shifting constantly from conditional mean models to probabilistic distributional models capturing location, scale, shape (and other aspects) of the response distribution. One of the most established models for distributional regression is the generalized additive model for location, scale and shape (GAMLSS). In high dimensional data set-ups classical fitting procedures for the GAMLSS often become rather unstable and methods for variable selection are desirable. Therefore, we propose a regularization approach for high dimensional data set-ups in the framework for GAMLSS. It is designed for linear covariate effects and is based on L1 -type penalties. The following three penalization options are provided: the conventional least absolute shrinkage and selection operator (LASSO) for metric covariates, and both group and fused LASSO for categorical predictors. The methods are investigated both for simulated data and for two real data examples, namely Munich rent data and data on extreme operational losses from the Italian bank UniCredit.

[1]  J. Huber,et al.  Scale matters: risk perception, return expectations, and investment propensity under different scalings , 2018, Experimental Economics.

[3]  Y. Crama,et al.  Practical methods for measuring and managing operational risk in the financial sector: a clinical study , 2008 .

[4]  Thomas Kneib,et al.  Understanding the Economic Determinants of the Severity of Operational Losses: A Regularized Generalized Pareto Regression Approach , 2018, Journal of Applied Econometrics.

[5]  A. Raftery,et al.  Probabilistic forecasts, calibration and sharpness , 2007 .

[6]  Benjamin Hofner,et al.  Generalized additive models for location, scale and shape for high dimensional data—a flexible approach based on boosting , 2012 .

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  G. Tutz,et al.  Sparse Modeling of Categorial Explanatory Variables , 2011, 1101.1421.

[9]  Achim Zeileis,et al.  BAMLSS: Bayesian Additive Models for Location, Scale, and Shape (and Beyond) , 2018, Journal of Computational and Graphical Statistics.

[10]  Amine Tarazi,et al.  Bank regulatory capital and liquidity: Evidence from US and European publicly traded banks , 2013 .

[11]  Duc Tran Huy,et al.  The acceptance of a protected area and the benefits of sustainable tourism: In search of the weak link in their relationship , 2017 .

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  Bernd Bischl,et al.  Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates , 2018, Stat. Comput..

[14]  Helena Fornwagner Incentives to lose revisited: The NHL and its tournament incentives , 2019 .

[15]  S. Pandey,et al.  What Are Degrees of Freedom , 2008 .

[16]  Achim Zeileis,et al.  Probabilistic Nowcasting of Low-Visibility Procedure States at Vienna International Airport During Cold Season , 2019, Pure and Applied Geophysics.

[17]  L. Fahrmeir,et al.  High dimensional structured additive regression models: Bayesian regularization, smoothing and predictive performance , 2011 .

[18]  M. Halla,et al.  The Intergenerational Causal Effect of Tax Evasion: Evidence from the Commuter Tax Allowance in Austria , 2017, Journal of the European Economic Association.

[19]  Eric Cope,et al.  Macroenvironmental determinants of operational loss severity , 2012 .

[20]  Florian Lindner Choking under pressure of top performers: Evidence from Biathlon competitions , 2017 .

[21]  Benjamin Hofner,et al.  gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework , 2014, 1407.1774.

[22]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  A. Zeileis,et al.  Low-visibility forecasts for different flight planning horizons using tree-based boosting models , 2019, Advances in Statistical Climatology, Meteorology and Oceanography.

[25]  S. Lang,et al.  Selective mortality and undernutrition in low- and middle-income countries , 2017 .

[26]  Nikolaus Umlauf,et al.  A primer on Bayesian distributional regression , 2018 .

[27]  Gerhard Tutz,et al.  A uniform framework for the combination of penalties in generalized structured models , 2015, Advances in Data Analysis and Classification.

[28]  M. Halla,et al.  Parental Leave, (In)formal Childcare, and Long-Term Child Outcomes , 2017, The Journal of Human Resources.

[29]  Paul Embrechts,et al.  An Extreme Value Approach for Modeling Operational Risk Losses Depending on Covariates , 2016 .

[30]  Achim Zeileis,et al.  Various versatile variances : An object-oriented implementation of clustered covariances in R Working , 2017 .

[31]  Fan Yu,et al.  The Determinants of Operational Risk in U.S. Financial Institutions , 2011, Journal of Financial and Quantitative Analysis.

[32]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .

[33]  Achim Zeileis,et al.  Hourly probabilistic snow forecasts over complex terrain: a hybrid ensemble postprocessing approach , 2018, Advances in Statistical Climatology, Meteorology and Oceanography.

[34]  PAUL EMBRECHTS,et al.  Modelling of extremal events in insurance and finance , 1994, Math. Methods Oper. Res..

[35]  Peter K. Dunn,et al.  Randomized Quantile Residuals , 1996 .

[36]  R. Rigby,et al.  Generalized Additive Models for Location Scale and Shape (GAMLSS) in R , 2007 .

[37]  Paul Povel,et al.  Booms, Busts, and Fraud , 2005 .

[38]  J. Chiquet,et al.  Fast Tree Inference With Weighted Fusion Penalties , 2014, 1407.5915.

[39]  Bank Capital and Uncertainty , 2010 .

[40]  Stefan Lang,et al.  For a list of recent papers see the backpages of this paper. Multilevel , 2022 .

[41]  Gerhard Tutz,et al.  Regularization and Model Selection with Categorial Effect Modifiers , 2010 .

[42]  Kurt Hornik,et al.  Probabilistic forecasts for the 2018 FIFA World Cup based on the bookmaker consensus model , 2018 .

[43]  Rudolf Kerschbamer,et al.  Do altruists lie less? , 2017, Journal of Economic Behavior & Organization.

[44]  U. Weitzel,et al.  Delegated Decision Making and Social Competition in the Finance Industry , 2018 .

[45]  Lionel Page,et al.  Can a Common Currency Foster a Shared Social Identity across Different Nations? The Case of the Euro , 2017 .

[46]  M. Sutter,et al.  How Uncertainty and Ambiguity in Tournaments Affect Gender Differences in Competitive Behavior , 2017, European Economic Review.

[47]  A. Zeileis,et al.  Skewed logistic distribution for statistical temperature post-processing in mountainous areas , 2019, Advances in Statistical Climatology, Meteorology and Oceanography.

[48]  Simon Czermak,et al.  Incentives for Dishonesty: An Experimental Study with Internal Auditors , 2017, Economic Inquiry.

[49]  Nikolaus Umlauf,et al.  Nonlinear association structures in flexible Bayesian additive joint models , 2017, Statistics in medicine.

[50]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[51]  T. Hothorn,et al.  Distributional regression forests for probabilistic precipitation forecasting in complex terrain , 2018, The Annals of Applied Statistics.

[52]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[53]  S. Renes,et al.  Fairness views and political preferences - Evidence from a large online experiment , 2017 .

[54]  M. Halla,et al.  Economic Origins of Cultural Norms: The Case of Animal Husbandry and Bastardy , 2017, European Economic Review.

[55]  Achim Zeileis,et al.  Anchor point selection: An approach for anchoring without anchor items , 2018 .

[56]  Benjamin Hofner,et al.  GAMLSS for high-dimensional data – a flexible approach based on boosting , 2010 .

[57]  M. Geiger,et al.  The role of correlation in two-asset games: Some experimental evidence , 2017 .

[58]  G. Kitagawa,et al.  Akaike Information Criterion Statistics , 1988 .

[59]  Susanne Pech,et al.  The effect of statutory sick-pay on workers' labor supply and subsequent health , 2017 .

[60]  Johann Scharler,et al.  How Do People Interpret Macroeconomic Shocks? Evidence from U.S. Survey Data , 2020, Journal of Money, Credit and Banking.

[61]  H. Bondell,et al.  Simultaneous Factor Selection and Collapsing Levels in ANOVA , 2009, Biometrics.

[62]  R. Kerschbamer,et al.  Social preferences and political attitudes: An online experiment on a large heterogeneous sample , 2020 .

[63]  M. D. Martínez-Miranda,et al.  Computational Statistics and Data Analysis , 2009 .