Bootstrap estimated true and false positive rates and ROC curve

Diagnostic studies and new biomarkers are assessed by the estimated true and false positive rates of the classification rule. One diagnostic rule is considered for high-dimensional predictor data. Cross-validation and the leave-one-out bootstrap are discussed to estimate true and false positive rates of classifiers by the machine learning methods Adaboost, Bagging, Random Forest, (penalized) logistic regression and support vector machines. The .632+ bootstrap estimation of the misclassification error has been previously proposed to adjust the overfitting of the apparent error. This idea is generalized to the estimation of true and false positive rates. Tree-based simulation models with 8 and 50 binary non-informative variables are analysed to examine the properties of the estimators. Finally, a bootstrap estimation of receiver operating characteristic (ROC) curves is suggested and a .632+ bootstrap estimation of ROC curves is discussed. This approach is applied to high-dimensional gene expression data of leukemia and predictors of image data for glaucoma diagnosis.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Carolin Strobl,et al.  Statistical Applications in Genetics and Molecular Biology Multiple Testing for SNP-SNP Interactions , 2007 .

[3]  Torsten Hothorn,et al.  Generalised indirect classifiers , 2005, Comput. Stat. Data Anal..

[4]  J. Friedman Stochastic gradient boosting , 2002 .

[5]  Tianxi Cai,et al.  The Performance of Risk Prediction Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[6]  Torsten Hothorn,et al.  Bundling Classifiers by Bagging Trees , 2002, Comput. Stat. Data Anal..

[7]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[8]  M. D. Martínez-Miranda,et al.  Computational Statistics and Data Analysis , 2009 .

[9]  Torsten Hothorn,et al.  Bagging Tree Classifiers for Laser Scanning Images: Data and Simulation Based Strategy , 2002, Artif. Intell. Medicine.

[10]  A. Brenning,et al.  Glaucoma Detection With Frequency Doubling Perimetry and Short-wavelength Perimetry , 2007, Journal of glaucoma.

[11]  R. Abseher,et al.  Microarray gene expression profiling of B-cell chronic lymphocytic leukemia subgroups defined by genomic aberrations and VH mutation status. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[12]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[13]  Harald Binder,et al.  Assessment of survival prediction models based on microarray data , 2007, Bioinform..

[14]  L. Klein-Hitpass,et al.  Microarray versus conventional prediction of lymph node metastasis in colorectal carcinoma , 2005, Cancer.

[15]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[16]  B Lausen,et al.  Comparison of classifiers applied to confocal scanning laser ophthalmoscopy data. , 2008, Methods of information in medicine.

[17]  L. Klein-Hitpass,et al.  Molecular Signature for Lymphatic Metastasis in Colorectal Carcinomas , 2008, Annals of surgery.

[18]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[19]  Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests ; Draft Guidance for Industry and FDA Reviewers Draft Guidance-Not for Implementation , 2003 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Mark Culp,et al.  ada: An R Package for Stochastic Boosting , 2006 .

[22]  Shyh-Huei Chen,et al.  A support vector machine approach for detecting gene‐gene interaction , 2008, Genetic epidemiology.

[23]  H. Quigley Number of people with glaucoma worldwide. , 1996, The British journal of ophthalmology.

[24]  Richard Simon,et al.  A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification , 2007, Statistics in medicine.

[25]  P. Qiu The Statistical Evaluation of Medical Tests for Classification and Prediction , 2005 .

[26]  Torsten Hothorn,et al.  New glaucoma classification method based on standard Heidelberg Retina Tomograph parameters by bagging classification trees. , 2003, Journal of glaucoma.

[27]  John R Thompson,et al.  Biostatistical Aspects of Genome‐Wide Association Studies , 2008, Biometrical journal. Biometrische Zeitschrift.

[28]  Torsten Hothorn,et al.  Bagging survival trees , 2002, Statistics in medicine.