论文信息 - Super Learning: An Application to Prediction of HIV-1 Drug Susceptibility

Super Learning: An Application to Prediction of HIV-1 Drug Susceptibility

Many statistical methods exist that can be used to learn a predictor based on observed data. Examples include decision trees, neural networks, support vector regression, least angle regression, Logic Regression, and the Deletion/Substitution/Addition algorithm. The optimal algorithm for prediction will vary depending on the underlying data-generating distribution. In this article, we introduce a “super learner,” a prediction algorithm that applies any set of candidate learners and uses crossvalidation to select among them. Theory shows that asymptotically the super learner performs essentially as well or better than any of the candidate learners. We briefly present the theory behind the super learner, before providing an example based on research aimed at predicting the in vitro phenotypic susceptibility of the HIV virus to antiretroviral drugs based on viral mutations. We apply the super learner to predict susceptibility to one protease inhibitor, nelfinavir, using a set of database-derived nonpolymorphic treatment-selected protease mutations.

[1] M. J. Laan. Statistical Inference for Variable Importance , 2006 .

[2] M. J. Laan,et al. Application of a Variable Importance Measure Method to HIV-1 Sequence Data , 2005 .

[3] Aad van der Vaart,et al. The Cross-Validated Adaptive Epsilon-Net Estimator , 2006 .

[4] R. Shafer,et al. Genotypic predictors of human immunodeficiency virus type 1 drug resistance , 2006, Proceedings of the National Academy of Sciences.

[5] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.

[6] S. Dudoit,et al. Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples , 2003 .

[7] Ingo Ruczinski,et al. Logic Regression — Methods and Software , 2003 .

[8] Tommy F. Liu,et al. HIV-1 Protease and reverse-transcriptase mutations: correlations with antiretroviral therapy in subtype B isolates and implications for drug-resistance surveillance. , 2005, The Journal of infectious diseases.

[9] Mark J van der Laan,et al. Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics , 2004, Statistical applications in genetics and molecular biology.