AUC-based biomarker ensemble with an application on gene scores predicting low bone mineral density

MOTIVATION The area under the receiver operating characteristic (ROC) curve (AUC), long regarded as a 'golden' measure for the predictiveness of a continuous score, has propelled the need to develop AUC-based predictors. However, the AUC-based ensemble methods are rather scant, largely due to the fact that the associated objective function is neither continuous nor concave. Indeed, there is no reliable numerical algorithm identifying optimal combination of a set of biomarkers to maximize the AUC, especially when the number of biomarkers is large. RESULTS We have proposed a novel AUC-based statistical ensemble methods for combining multiple biomarkers to differentiate a binary response of interest. Specifically, we propose to replace the non-continuous and non-convex AUC objective function by a convex surrogate loss function, whose minimizer can be efficiently identified. With the established framework, the lasso and other regularization techniques enable feature selections. Extensive simulations have demonstrated the superiority of the new methods to the existing methods. The proposal has been applied to a gene expression dataset to construct gene expression scores to differentiate elderly women with low bone mineral density (BMD) and those with normal BMD. The AUCs of the resulting scores in the independent test dataset has been satisfactory. CONCLUSION Aiming for directly maximizing AUC, the proposed AUC-based ensemble method provides an efficient means of generating a stable combination of multiple biomarkers, which is especially useful under the high-dimensional settings. CONTACT lutian@stanford.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Hao Liu,et al.  On the analysis of glycomics mass spectrometry data via the regularized area under the ROC curve , 2007, BMC Bioinformatics.

[2]  T. Cai,et al.  Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve , 2006, Biometrics.

[3]  R. Tibshirani,et al.  Additive Logistic Regression : a Statistical View ofBoostingJerome , 1998 .

[4]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[5]  Jian Huang,et al.  Regularized ROC method for disease classification and biomarker selection with microarray data , 2005, Bioinform..

[6]  Javier M. Moguerza,et al.  Support Vector Machines with Applications , 2006, math/0612817.

[7]  C. Cooper,et al.  The crippling consequences of fractures and their impact on quality of life. , 1997, The American journal of medicine.

[8]  Bogdan E. Popescu,et al.  Gradient Directed Regularization , 2004 .

[9]  X H Zhou,et al.  Variable selection using the optimal ROC curve: An application to a traditional Chinese medicine study on osteoporosis disease , 2012, Statistics in medicine.

[10]  T. Cai,et al.  Robust combination of multiple diagnostic tests for classifying censored event times. , 2007, Biostatistics.

[11]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[12]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[13]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[14]  T. Hastie,et al.  Comment on "Support Vector Machines with Applications" , 2006, math/0612824.

[15]  Xiao-Hua Zhou,et al.  Statistical Methods in Diagnostic Medicine , 2002 .

[16]  T Cai,et al.  Regularized Estimation for the Accelerated Failure Time Model , 2009, Biometrics.

[17]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[18]  Hua Jin,et al.  A Procedure for Determining Whether a Simple Combination of Diagnostic Tests May Be Noninferior to the Theoretical Optimum Combination , 2008, Medical decision making : an international journal of the Society for Medical Decision Making.

[19]  Jian Huang,et al.  Combining Multiple Markers for Classification Using ROC , 2007, Biometrics.

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  Marit Holden,et al.  Eight genes are highly associated with BMD variation in postmenopausal Caucasian women. , 2010, Bone.

[22]  Shinto Eguchi,et al.  A boosting method for maximizing the partial area under the ROC curve , 2010, BMC Bioinformatics.