Transformation boosting machines

The broad class of conditional transformation models includes interpretable and simple as well as potentially very complex models for conditional distributions. This makes conditional transformation models attractive for predictive distribution modelling, especially because models featuring interpretable parameters and black-box machines can be understood as extremes in a whole cascade of models. So far, algorithms and corresponding theory was developed for special forms of conditional transformation models only: maximum likelihood inference is available for rather simple models, there exists a tailored boosting algorithm for the estimation of additive conditional transformation models, and a special form of random forests targets the estimation of interaction models. Here, I propose boosting algorithms capable of estimating conditional transformation models of arbitrary complexity, starting from simple shift transformation models featuring linear predictors to essentially unstructured conditional transformation models allowing complex nonlinear interaction functions. A generic form of the likelihood is maximized. Thus, the novel boosting algorithms for conditional transformation models are applicable to all types of univariate response variables, including randomly censored or truncated observations.

[1]  Torsten Hothorn,et al.  Conditional transformation models , 2012, 1201.5786.

[2]  F. Peracchi,et al.  The Conditional Distribution of Excess Returns: An Empirical Analysis , 1994 .

[3]  Tanya P Garcia,et al.  Time‐varying proportional odds model for mega‐analysis of clustered event times , 2019, Biostatistics.

[4]  Colin O. Wu,et al.  Nonparametric Estimation of Conditional Distributions and Rank-Tracking Probabilities With Time-Varying Transformation Models in Longitudinal Studies , 2013 .

[5]  Achim Zeileis,et al.  Transformation Forests , 2017, 1701.02110.

[6]  Wenbin Lu,et al.  Boosting method for nonlinear transformation models with censored survival data. , 2008, Biostatistics.

[7]  Benjamin Hofner,et al.  Boosting for statistical modelling-A non-technical introduction , 2018 .

[8]  Matthew Pratola,et al.  Heteroscedastic BART Using Multiplicative Regression Trees , 2017, 1709.07542.

[9]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[10]  Dominik Wied,et al.  Misspecification Testing in a Class of Conditional Distributional Models , 2013 .

[11]  Gerhard Tutz,et al.  Variable Selection and Model Choice in Geoadditive Regression Models , 2009, Biometrics.

[12]  T. Hothorn,et al.  Oxaliplatin added to fluorouracil-based preoperative chemoradiotherapy and postoperative chemotherapy of locally advanced rectal cancer (the German CAO/ARO/AIO-04 study): final results of the multicentre, open-label, randomised, phase 3 trial. , 2015, The Lancet. Oncology.

[13]  Benjamin Hofner,et al.  Generalized additive models for location, scale and shape for high dimensional data—a flexible approach based on boosting , 2012 .

[14]  Brenda López Cabrera,et al.  Forecasting Generalized Quantiles of Electricity Demand: A Functional Data Approach , 2014 .

[15]  Shuangge Ma,et al.  Sparse boosting for high‐dimensional survival data with varying coefficients , 2018, Statistics in medicine.

[16]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[17]  Benjamin Hofner,et al.  GAMLSS for high-dimensional data – a flexible approach based on boosting , 2010 .

[18]  Yun Yang,et al.  Joint Estimation of Quantile Planes Over Arbitrary Predictor Spaces , 2015, 1507.03130.

[19]  Torsten Hothorn,et al.  Improved prediction of body fat by measuring skinfold thickness, circumferences, and bone breadths. , 2005, Obesity research.

[20]  Torsten Hothorn,et al.  Conditional Transformation Models for Survivor Function Estimation , 2015, The international journal of biostatistics.

[21]  Stef van Buuren,et al.  Continuing Positive Secular Growth Change in the Netherlands 1955–1997 , 2000, Pediatric Research.

[22]  M. Durbán,et al.  Generalized linear array models with applications to multidimensional smoothing , 2006 .

[23]  Torsten Hothorn,et al.  Geoadditive regression modeling of stream biological condition , 2010, Environmental and Ecological Statistics.

[24]  Torsten Hothorn,et al.  A Framework for Unbiased Model Selection Based on Boosting , 2011 .

[25]  R. Rigby,et al.  Generalized additive models for location, scale and shape , 2005 .

[26]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[27]  Torsten Hothorn,et al.  Association of extinction risk of saproxylic beetles with ecological degradation of forests in Europe , 2015, Conservation biology : the journal of the Society for Conservation Biology.

[28]  Torsten Hothorn,et al.  Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression , 2011 .

[29]  Torsten Hothorn,et al.  Flexible boosting of accelerated failure time models , 2008, BMC Bioinformatics.

[30]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[31]  J. Griffin,et al.  A Bayesian Quantile Time Series Model for Asset Returns , 2019, Journal of Business & Economic Statistics.

[32]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[33]  Torsten Hothorn,et al.  Most Likely Transformations , 2015, 1508.06749.

[34]  R. Koenker Quantile Regression: Name Index , 2005 .

[35]  Hemant Ishwaran,et al.  Boosted Nonparametric Hazards with Time-Dependent Covariates , 2017, Annals of statistics.

[36]  Johann S. Hawe,et al.  Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression , 2014, Nature Biotechnology.

[37]  M. T. Pratola,et al.  Heteroscedastic BART via Multiplicative Regression Trees , 2020 .

[38]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[39]  T. Hothorn,et al.  Individual treatment effect prediction for amyotrophic lateral sclerosis patients , 2018, Statistical methods in medical research.

[40]  Donald K. K. Lee,et al.  Boosting Hazard Regression with Time-Varying Covariates , 2017 .

[41]  V. Chernozhukov,et al.  Inference on Counterfactual Distributions , 2009, 0904.0951.

[42]  Jeffrey S. Racine,et al.  Nonparametric Estimation of Conditional CDF and Quantile Functions With Mixed Categorical and Continuous Data , 2008 .

[43]  T Hothorn,et al.  Weight estimation by three‐dimensional ultrasound imaging in the small fetus , 2008, Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology.