Prediction Inference with Ensemble Methods

Acknowledgements I am deeply grateful for the support of many people that helped me during these last months while I worked on this project. A special thank you to.... • Dipl. Stat. Nora Fenske for her excellent supervision, her accurate proofreading , her support, her commitment and her patience. I also want to thank her for making all her R code about quantile regression available to me. • Prof. Dr. Torsten Hothorn for his excellent supervision and for the trust he put in me by offering me this thesis. I also want to thank him for looking through much of the R code that led to the simulations and for the essential input around the questions of how to interpret and evaluate prediction intervals. • Prof. Dr. Helmut Küchenhoff and his team of the statistical consulting unit (especially Juliane Manitz and André Klima) for giving me the great opportunity to test our methods in practice with the movie data. • Elisabeth Waldmann and Juliane Manitz for proofreading and their fair comments. • Dipl. Stat. Michael Obermeier and Birgit Oppolzer for important L A T E X tips and for sharing the office with me while I worked on this thesis. for the 3 pm coffee therapy every day and their moral support. • Dipl. Stat. Benjamin Hofner for encouraging me to select this topic and mailing me his thesis which I allowed me to use as a guideline concerning many aspects. • My friends, my family and Belén for their constant support and encouragement .

[1]  Eric G. Forbes,et al.  Gauss and the Discovery of Ceres , 1971 .

[2]  Helmut Strasser,et al.  On the Asymptotic Theory of Permutation Statistics , 1999 .

[3]  Gerhard Tutz,et al.  Variable Selection and Model Choice in Geoadditive Regression Models , 2009, Biometrics.

[4]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[5]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[6]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[7]  Benjamin Hofner,et al.  Variable Selection and Model Choice in Survival Models with Time-Varying Effects , 2008 .

[8]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[9]  M. Bottai,et al.  Quantile regression for longitudinal data using the asymmetric Laplace distribution. , 2007, Biostatistics.

[10]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[13]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Torsten Hothorn,et al.  Boosting additive models using component-wise P-Splines , 2008, Comput. Stat. Data Anal..

[17]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[18]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[19]  T. Lancaster,et al.  Bayesian Quantile Regression , 2005 .

[20]  John W. Tukey,et al.  Data Analysis and Regression: A Second Course in Statistics , 1977 .

[21]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[22]  Keming Yu,et al.  Bayesian quantile regression , 2001 .

[23]  Torsten Hothorn,et al.  Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression , 2011 .

[24]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[25]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[26]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[27]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[28]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[29]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[30]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[31]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[32]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[33]  L. Breiman Arcing Classifiers , 1998 .

[34]  R. Koenker Quantile Regression: Quantile Regression in R: A Vignette , 2005 .

[35]  Joseph Hilbe,et al.  A Handbook of Statistical Analyses Using R , 2006 .

[36]  K. Hornik,et al.  party : A Laboratory for Recursive Partytioning , 2009 .

[37]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[38]  L. Fahrmeir,et al.  Regression - Modelle, Methoden und Anwendungen , 2009 .