Most Likely Transformations: The mlt Package

The mlt package implements maximum likelihood estimation in the class of conditional transformation models. Based on a suitable explicit parameterization of the unconditional or conditional transformation function using infrastructure from package basefun, we show how one can define, estimate, and compare a cascade of increasingly complex transformation models in the maximum likelihood framework. Models for the unconditional or conditional distribution function of any univariate response variable are set-up and estimated in the same computational framework simply by choosing an appropriate transformation function and parameterization thereof. As it is computationally cheap to evaluate the distribution function, models can be estimated by maximization of the exact likelihood, especially in the presence of random censoring or truncation. The relatively dense high-level implementation in the R system for statistical computing allows generalization of many established implementations of linear transformation models, such as the Cox model or other parametric models for the analysis of survival or ordered categorical data, to the more complex situations illustrated in this paper.

[1]  Jian Liu,et al.  Physical Size and Sexual Orientation: Analysis of the Chinese Health and Family Life Survey , 2013, Archives of Sexual Behavior.

[2]  T. Hothorn,et al.  Continuous outcome logistic regression for analyzing body mass index distributions , 2017, F1000Research.

[3]  Achim Zeileis,et al.  Transformation Forests , 2017, 1701.02110.

[4]  Mei-Jie Zhang,et al.  Analyzing Competing Risk Data Using the R timereg Package. , 2011, Journal of statistical software.

[5]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[6]  Rida T. Farouki,et al.  The Bernstein polynomial basis: A centennial retrospective , 2012, Comput. Aided Geom. Des..

[7]  M. Pagano,et al.  Survival analysis. , 1996, Nutrition.

[8]  Thomas Yee,et al.  VGAM: Vector Generalized Linear and Additive Models 1.0-4 , 2017 .

[9]  Torsten Hothorn,et al.  Temporal patterns of deer-vehicle collisions consistent with deer activity pattern and density increase but not general accident risk. , 2015, Accident; analysis and prevention.

[10]  T. Hothorn,et al.  Maximally selected two-sample statistics as a new tool for the identification and assessment of habitat factors with an application to breeding-bird communities in oak forests , 2004, European Journal of Forest Research.

[11]  Ludwig Fahrmeir,et al.  Regression: Models, Methods and Applications , 2013 .

[12]  Jun Yan Survival Analysis: Techniques for Censored and Truncated Data , 2004 .

[13]  Torsten Hothorn,et al.  Transformation boosting machines , 2019, Statistics and Computing.

[14]  R. Rigby,et al.  Generalized Additive Models for Location Scale and Shape (GAMLSS) in R , 2007 .

[15]  A. Dreher Modeling Survival Data Extending The Cox Model , 2016 .

[16]  Stef van Buuren,et al.  Continuing Positive Secular Growth Change in the Netherlands 1955–1997 , 2000, Pediatric Research.

[17]  M. Durbán,et al.  Generalized linear array models with applications to multidimensional smoothing , 2006 .

[18]  Ravi Varadhan,et al.  BB: An R Package for Solving a Large System of Nonlinear Equations and for Optimizing a High-Dimensional Nonlinear Objective Function , 2009 .

[19]  Torsten Hothorn,et al.  Top-down transformation choice , 2017, 1706.08269.

[20]  T. Hothorn,et al.  Simultaneous Inference in General Parametric Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[21]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[22]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[23]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[24]  James K. Lindsey,et al.  Parametric Statistical Inference , 1996 .

[25]  Zhiliang Ying,et al.  Semiparametric analysis of transformation models with censored data , 2002 .

[26]  Ravi Varadhan,et al.  Solving and Optimizing Large-Scale Nonlinear Systems , 2014 .

[27]  V. Chernozhukov,et al.  Inference on Counterfactual Distributions , 2009, 0904.0951.

[28]  Brian T. Maurer Regression. , 2020, JAAPA : official journal of the American Academy of Physician Assistants.

[29]  Roger Koenker,et al.  Distributional vs. Quantile Regression , 2013 .

[30]  J. Lindsey,et al.  Some statistical heresies , 1999 .

[31]  Torsten Hothorn,et al.  Conditional transformation models , 2012, 1201.5786.

[32]  R. Fisher Two New Properties of Mathematical Likelihood , 1934 .

[33]  F. Peracchi,et al.  The Conditional Distribution of Excess Returns: An Empirical Analysis , 1994 .

[34]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[35]  Sujit K. Ghosh,et al.  A variable selection approach to monotonic regression with Bernstein polynomials , 2011 .

[36]  Chong Gu,et al.  Smoothing Spline ANOVA Models: R Package gss , 2014 .

[37]  B. Ripley Support Functions and Datasets for Venables and Ripley's MASS , 2015 .

[38]  Torsten Hothorn Infrastructure for Computing with Basis Functions , 2016 .

[39]  A. Bowman,et al.  A look at some data on the old faithful geyser , 1990 .

[40]  W. Wien Econometric Computing with HC and HAC Covariance Matrix Estimators , 2004 .

[41]  T. Mroz,et al.  The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions , 1987 .

[42]  Korepanova Natalia,et al.  Survival Forests under Test: Impact of the Proportional Hazards Assumption on Prognostic and Predictive Forests for ALS Survival , 2019 .

[43]  T. Yee The VGAM Package for Categorical Data Analysis , 2010 .

[44]  Torben Martinussen,et al.  Dynamic Regression Models for Survival Data , 2006 .

[45]  Torsten Hothorn,et al.  Most Likely Transformations , 2015, 1508.06749.

[46]  P. Royston,et al.  Flexible parametric proportional‐hazards and proportional‐odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects , 2002, Statistics in medicine.