Multivariable Modeling Strategies

Chapter 2 dealt with aspects of modeling such as transformations of predictors, relaxing linearity assumptions, modeling interactions, and examining lack of fit. Chapter 3 dealt with missing data, focusing on utilization of incomplete predictor information. All of these areas are important in the overall scheme of model development, and they cannot be separated from what is to follow. In this chapter we concern ourselves with issues related to the whole model, with emphasis on deciding on the amount of complexity to allow in the model and on dealing with large numbers of predictors. The chapter concludes with three default modeling strategies depending on whether the goal is prediction, estimation, or hypothesis testing.

[1]  J Blangero,et al.  Large upward bias in estimation of locus-specific effects from genomewide scans. , 2001, American journal of human genetics.

[2]  L. Ferré Determining the Dimension in Sliced Inverse Regression and Related Methods , 1998 .

[3]  A. Ciampi,et al.  Stratification by stepwise regression, correspondence analysis and recursive partition: A comparison of three methods of analysis for survival data with covaria , 1986 .

[4]  F. Harrell,et al.  Development of a clinical prediction model for an ordinal outcome: the World Health Organization Multicentre Study of Clinical Signs and Etiological agents of Pneumonia, Sepsis and Meningitis in Young Infants. WHO/ARI Young Infant Multicentre Study Group. , 1998, Statistics in medicine.

[5]  D J Spiegelhalter,et al.  Probabilistic prediction in patient management and clinical trials. , 1986, Statistics in medicine.

[6]  Ryan E Wiegand,et al.  Performance of using multiple stepwise algorithms for variable selection , 2010, Statistics in medicine.

[7]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[8]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[9]  G W Sun,et al.  Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. , 1996, Journal of clinical epidemiology.

[10]  Jianhua Z. Huang,et al.  SPARSE LOGISTIC PRINCIPAL COMPONENTS ANALYSIS FOR BINARY DATA. , 2010, The annals of applied statistics.

[11]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[12]  Peter Hall,et al.  Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems , 2009 .

[13]  David A. Belsley,et al.  Conditioning Diagnostics: Collinearity and Weak Data in Regression , 1991 .

[14]  Lynn Friedman,et al.  Graphical Views of Suppression and Multicollinearity in Multiple Linear Regression , 2005 .

[15]  F. Harrell,et al.  Regression modelling strategies for improved prognostic prediction. , 1984, Statistics in medicine.

[16]  N F de Keizer,et al.  External validation of prognostic models for critically ill patients required substantial sample sizes. , 2007, Journal of clinical epidemiology.

[17]  J. K. Benedetti,et al.  Effective sample size for tests of censored survival data , 1982 .

[18]  M Schumacher,et al.  A bootstrap resampling procedure for model building: application to the Cox regression model. , 1992, Statistics in medicine.

[19]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[20]  B. Efron,et al.  Stein's Paradox in Statistics , 1977 .

[21]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[22]  J. Concato,et al.  Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. , 1995, Journal of clinical epidemiology.

[23]  S. Vines Simple principal components , 2000 .

[24]  Jane-Ling Wang,et al.  Dimension reduction for censored regression data , 1999 .

[25]  J. Copas,et al.  Estimating the Residual Variance in Orthogonal Regression with Variable Selection , 1991 .

[26]  P. J. Verweij,et al.  Penalized likelihood in Cox regression. , 1994, Statistics in medicine.

[27]  N. Meinshausen Hierarchical testing of variable importance , 2008 .

[28]  J. Habbema,et al.  Prognostic Modeling with Logistic Regression Analysis , 2001, Medical decision making : an international journal of the Society for Medical Decision Making.

[29]  Ewout W Steyerberg,et al.  Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints , 2014, BMC Medical Research Methodology.

[30]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[31]  Thomas A Gerds,et al.  A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index , 2014, Statistics in medicine.

[32]  R. Tibshirani,et al.  Adaptive Principal Surfaces , 1994 .

[33]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[34]  Antai Wang,et al.  Gene selection for microarray data analysis using principal component analysis , 2005, Statistics in medicine.

[35]  Ian T. Jolliffe,et al.  Discarding Variables in a Principal Component Analysis. I: Artificial Data , 1972 .

[36]  Michael J Pencina,et al.  Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models , 2012, Statistics in medicine.

[37]  W. Hoeffding A Non-Parametric Test of Independence , 1948 .

[38]  Charles E McCulloch,et al.  Relaxing the rule of ten events per variable in logistic and Cox regression. , 2007, American journal of epidemiology.

[39]  W G Henderson,et al.  Assessment of predictive models for binary outcomes: an empirical approach using operative death from cardiac surgery. , 1994, Statistics in medicine.

[40]  J. C. van Houwelingen,et al.  Predictive value of statistical models , 1990 .

[41]  R. Cook,et al.  Principal fitted components for dimension reduction in regression , 2008, 0906.3953.

[42]  P. Grambsch,et al.  The effects of transformations and preliminary tests for non-linearity in regression. , 1991, Statistics in medicine.

[43]  R. Christensen,et al.  Fisher Lecture: Dimension Reduction in Regression , 2007, 0708.3774.

[44]  R. D'Agostino,et al.  Development of health risk appraisal functions in the presence of multiple indicators: the Framingham Study nursing home institutionalization model. , 1995, Statistics in medicine.

[45]  F. Harrell,et al.  Regression models for prognostic prediction: advantages, problems, and suggested solutions. , 1985, Cancer treatment reports.

[46]  R. H. Myers Classical and modern regression with applications , 1986 .

[47]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[48]  George Michailidis,et al.  Principal Component Analysis With Sparse Fused Loadings , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[49]  Nancy R. Cook,et al.  Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction , 2007, Circulation.

[50]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[51]  Jan de Leeuw,et al.  Gifi Methods for Optimal Scaling in R: The Package homals , 2009 .

[52]  J. Hinde,et al.  Correspondence analysis as a screening method for indicants for clinical diagnosis. , 1989, Statistics in medicine.

[53]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[54]  A. Leclerc,et al.  Correspondence analysis and logistic modelling: complementary use in the analysis of a health survey among nurses. , 1988, Statistics in medicine.

[55]  A. Atkinson A note on the generalized information criterion for choice of a model , 1980 .

[56]  S. Glantz,et al.  Primer of Applied Regression & Analysis of Variance , 1990 .

[57]  Ellen B. Roecker,et al.  Prediction error and its estimation for subset-selected models , 1991 .

[58]  N. Mantel Why Stepdown Procedures in Variable Selection , 1970 .

[59]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[60]  L. Goldman,et al.  The SUPPORT Prognostic Model: Objective Estimates of Survival for Seriously Ill Hospitalized Adults , 1995, Annals of Internal Medicine.

[61]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[62]  Forrest W. Young,et al.  The principal components of mixed measurement level multivariate data: An alternating least squares method with optimal scaling features , 1978 .

[63]  Xiaohui Luo,et al.  Tuning Variable Selection Procedures by Adding Noise , 2006, Technometrics.

[64]  D. Altman,et al.  Bootstrap investigation of the stability of a Cox regression model. , 1989, Statistics in medicine.

[65]  Robert Gray,et al.  Flexible Methods for Analyzing Survival Data Using Splines, with Applications to Breast Cancer Prognosis , 1992 .

[66]  Xuming He,et al.  Linear regression after spline transformation , 1997 .

[67]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[68]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[69]  J. Whitehead Sample size calculations for ordered categorical data. , 1993, Statistics in medicine.

[70]  J. Faraway On the Cost of Data Analysis , 1992 .

[71]  Nick Tg,et al.  Regression modeling strategies: an illustrative case study from medical rehabilitation outcomes research. , 1999 .

[72]  I. Jolliffe Principal Component Analysis , 2005 .

[73]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[74]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[75]  I. Spence,et al.  A Remarkable Scatterplot , 1993 .

[76]  J. Edward Jackson,et al.  A User's Guide to Principal Components: Jackson/User's Guide to Principal Components , 2004 .

[77]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[78]  M. Pencina,et al.  Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond , 2008, Statistics in medicine.

[79]  H C Van Houwelingen,et al.  Construction, validation and updating of a prognostic model for kidney graft survival. , 1995, Statistics in medicine.

[80]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[81]  M. Greenacre Correspondence analysis of multivariate categorical data by weighted least-squares , 1988 .

[82]  William N. Venables,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[83]  John R Fieberg,et al.  Spending degrees of freedom in a poor economy: A case study of building a sightability model for moose in northeastern Minnesota , 2012 .

[84]  Clifford M. Hurvich,et al.  The impact of model selection on inference in linear regression , 1990 .

[85]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[86]  Robert E. Weiss,et al.  The Influence of Variable Selection: A Bayesian Diagnostic Perspective , 1995 .

[87]  J. Friedman A VARIABLE SPAN SMOOTHER , 1984 .

[88]  Robert E. Weiss,et al.  The Cost of Adding Parameters to a Model , 1996 .

[89]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[90]  J. E. Jackson,et al.  Factor analysis, an applied approach , 1983 .

[91]  R. Dennis Cook,et al.  Optimal sufficient dimension reduction in regressions with categorical predictors , 2002 .

[92]  Xiaotong Shen,et al.  Inference After Model Selection , 2004 .

[93]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[94]  Chris Chatfield,et al.  Avoiding Statistical Pitfalls , 1991 .

[95]  J. Leeuw,et al.  The Gifi system of descriptive multivariate analysis , 1998 .

[96]  J. Habbema,et al.  Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. , 2000, Statistics in medicine.

[97]  S. Greenland When Should Epidemiologic Regressions Use Random Coefficients? , 2000, Biometrics.

[98]  P. J. Verweij,et al.  Cross-validation in survival analysis. , 1993, Statistics in medicine.

[99]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[100]  Russell D. Wolfinger,et al.  A comparison of two approaches for selecting covariance structures in the analysis of repeated measurements , 1998 .

[101]  Y. Wax,et al.  Collinearity diagnosis for a relative risk regression analysis: an application to assessment of diet-cancer relationship in epidemiological studies. , 1992, Statistics in medicine.

[102]  S J Pocock,et al.  Prognostic scores for detecting a high risk group: estimating the sensitivity when applied to new data. , 1990, Statistics in medicine.

[103]  Ewout W Steyerberg,et al.  Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers , 2011, Statistics in medicine.

[104]  J. Lawless,et al.  Efficient Screening of Nonnormal Regression Models , 1978 .

[105]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[106]  L. Breiman The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error , 1992 .

[107]  Peter C Austin,et al.  Bootstrap model selection had similar performance for selecting authentic and noise variables compared to backward variable elimination: a simulation study. , 2008, Journal of clinical epidemiology.

[108]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .