Overrating Classifier Performance in ROC Analysis in the Absence of a Test Set: Evidence from Simulation and Italian CARATkids Validation

BACKGROUND  The use of receiver operating characteristic curves, or "ROC analysis," has become quite common in biomedical research to support decisions. However, sensitivity, specificity, and misclassification rates are still often estimated using the training sample, overlooking the risk of overrating the test performance. METHODS  A simulation study was performed to highlight the inferential implications of splitting (or not) the dataset into training and test set. The normality assumption was made for the classifier given the disease status, and the Youden's criterion considered for the detection of the optimal cutoff. Then, an ROC analysis with sample split was applied to assess the discriminant validity of the Italian version of the Control of Allergic Rhinitis and Asthma Test (CARATkids) questionnaire for children with asthma and rhinitis, for which recent studies may have reported liberal performance estimates. RESULTS  The simulation study showed that both single split and cross-validation (CV) provided unbiased estimators of sensitivity, specificity, and misclassification rate, therefore allowing computation of confidence intervals. For the Italian CARATkids questionnaire, the misclassification rate estimated by fivefold CV was 0.22, with 95% confidence interval 0.14 to 0.30, indicating an acceptable discriminant validity. CONCLUSIONS  Splitting into training and test set avoids overrating the test performance in ROC analysis. Validated through this method, the Italian CARATkids is valid for assessing disease control in children with asthma and rhinitis.

[1]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[2]  B. McNeil,et al.  Determining the value of diagnostic and screening tests. , 1976, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.

[3]  S. Walter,et al.  Estimating the error rates of diagnostic tests. , 1980, Biometrics.

[4]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[5]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[6]  M. Coffin,et al.  Receiver operating characteristic studies and measurement errors. , 1997, Biometrics.

[7]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[8]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[9]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[10]  K. Zou,et al.  Statistical validation based on parametric receiver operating characteristic analysis of continuous classification data. , 2003, Academic radiology.

[11]  Lucila Ohno-Machado,et al.  The use of receiver operating characteristic curves in biomedical informatics , 2005, J. Biomed. Informatics.

[12]  R. Brereton Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data , 2006 .

[13]  N. Perkins,et al.  The inconsistency of "optimal" cutpoints obtained using two criteria based on the receiver operating characteristic curve. , 2006, American journal of epidemiology.

[14]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[15]  Georgios C. Anagnostopoulos,et al.  An analysis of misclassification rates for decision trees , 2007 .

[16]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[17]  Age K. Smilde,et al.  UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation , 2008 .

[18]  C. Sorkness,et al.  Development and cross-sectional validation of the Childhood Asthma Control Test. , 2007, The Journal of allergy and clinical immunology.

[19]  Kevin K Dobbin,et al.  Optimally splitting cases for training and testing high dimensional classifiers , 2011, BMC Medical Genomics.

[20]  A. Boner,et al.  Evaluation of association between exercise‐induced bronchoconstriction and childhood asthma control test questionnaire scores in children , 2012, Pediatric pulmonology.

[21]  Songthip Ounpraseuth,et al.  Estimating misclassification error: a closer look at cross-validation based methods , 2012, BMC Research Notes.

[22]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[23]  K. Hajian‐Tilaki,et al.  Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. , 2013, Caspian journal of internal medicine.

[24]  E. Meltzer,et al.  Reliability, validity, and responsiveness of the Rhinitis Control Assessment Test in patients with rhinitis. , 2013, The Journal of allergy and clinical immunology.

[25]  J. Fonseca,et al.  Validation of Control of Allergic Rhinitis and Asthma Test for Children (CARATKids) – a prospective multicenter study , 2014, Pediatric allergy and immunology : official publication of the European Society of Pediatric Allergy and Immunology.

[26]  J. D. de Jongste,et al.  Monitoring childhood asthma: web-based diaries and the asthma control test. , 2014, The Journal of allergy and clinical immunology.

[27]  Translation into Portuguese and validation of the Rhinitis Control Assessment Test (RCAT) questionnaire. , 2016, Brazilian journal of otorhinolaryngology.

[28]  Berthold Lausen,et al.  Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set , 2016, Methods of Information in Medicine.

[29]  B. Dimitrov,et al.  PICADAR: a diagnostic predictive tool for primary ciliary dyskinesia , 2016, European Respiratory Journal.

[30]  J. Fonseca,et al.  Control of Allergic Rhinitis and Asthma Test for Children (CARATKids): Validation in Brazil and cutoff values. , 2017, Annals of allergy, asthma & immunology : official publication of the American College of Allergy, Asthma, & Immunology.

[31]  A. Niimi,et al.  Optimal cut-off value and clinical usefulness of the Adherence Starts with Knowledge-12 in patients with asthma taking inhaled corticosteroids. , 2017, Journal of thoracic disease.

[32]  R. Gerth van Wijk,et al.  Use of the Control of Allergic Rhinitis and Asthma Test (CARATkids) in children and adolescents: Validation in Dutch , 2017, Pediatric allergy and immunology : official publication of the European Society of Pediatric Allergy and Immunology.