Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data

Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an “add-on” trial comparing the efficacy of adding didanosine to an on-going failing regimen. Our aim was also to investigate the impact on the use of different cross-validation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R 2 values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between results based on cross-validated risks and results from full dataset. The Super Learner methodology and linear model provided around 80% of patients correctly classified. The difference between the lower and higher rates is around 10 percent. The number of mutations retained in different learners also varys from one to 41. Conclusions. The more recent Super Learner methodology combining the prediction of many learners provided good performance on our small dataset.

[1]  Rami Kantor,et al.  The Genetic Basis of HIV-1 Resistance to Reverse Transcriptase and Protease Inhibitors. , 2000, AIDS reviews.

[2]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[3]  V. Calvez,et al.  Clinical validation of atazanavir/ritonavir genotypic resistance score in protease inhibitor-experienced patients , 2006, AIDS.

[4]  Ana Pérez-Luño Working paper series , 2009 .

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  F. Brun-Vézinet,et al.  Prevalence of HIV-1 Drug Resistance in Treated Patients: A French Nationwide Study , 2007, Journal of acquired immune deficiency syndromes.

[7]  R A Betensky,et al.  Clinical trials using HIV-1 RNA-based primary endpoints: statistical analysis and potential biases. , 1999, Journal of acquired immune deficiency syndromes and human retrovirology : official publication of the International Retrovirology Association.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Initiatives for developing and comparing genotype interpretation systems: external validation of existing systems for didanosine against virological response. , 2008, The Journal of infectious diseases.

[10]  V. Calvez,et al.  Comparison of Tests and Procedures to Build Clinically Relevant Genotypic Scores: Application to the Jaguar Study , 2005, Antiviral therapy.

[11]  Mark J van der Laan,et al.  Super Learning: An Application to the Prediction of HIV-1 Drug Resistance , 2007, Statistical applications in genetics and molecular biology.

[12]  R. Samudrala,et al.  Simple Linear Model Provides Highly Accurate Genotypic Predictions of HIV-1 Drug Resistance , 2003, Antiviral therapy.

[14]  Giovanni Ulivi,et al.  Investigation of expert rule bases, logistic regression, and non-linear machine learning techniques for predicting response to antiretroviral treatment , 2008, Antiviral therapy.

[15]  Sunduz Keles,et al.  Statistical Applications in Genetics and Molecular Biology Supervised Detection of Conserved Motifs in DNA Sequences with Cosmo , 2011 .

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[18]  R. Shafer,et al.  Genotypic predictors of human immunodeficiency virus type 1 drug resistance , 2006, Proceedings of the National Academy of Sciences.

[19]  Thomas Lengauer,et al.  Quantitative phenotype prediction by support vector machines , 2002 .

[20]  Thomas Lengauer,et al.  Learning from Past Treatments and Their Outcome Improves Prediction of In Vivo Response to Anti-HIV Therapy , 2011, Statistical applications in genetics and molecular biology.

[21]  J. Schapiro,et al.  Methods for investigation of the relationship between drug-susceptibility phenotype and human immunodeficiency virus type 1 genotype with applications to AIDS clinical trials group 333. , 2000, The Journal of infectious diseases.

[22]  T. Merigan,et al.  Highly drug-resistant HIV-1 clinical isolates are cross-resistant to many antiretroviral compounds in current clinical development. , 1999, AIDS.

[23]  D. Descamps,et al.  Estimating and Comparing Reduction in HIV-1 RNA in Clinical Trials Using Methods for Interval Censored Data , 2004, Journal of acquired immune deficiency syndromes.

[24]  V. Calvez,et al.  Didanosine in HIV-1-infected patients experiencing failure of antiretroviral therapy: a randomized placebo-controlled trial. , 2005, The Journal of infectious diseases.

[25]  M. LeBlanc,et al.  Logic Regression , 2003 .

[26]  J. Aslanzadeh HIV resistance testing: an update. , 2002, Annals of clinical and laboratory science.

[27]  D. Costagliola,et al.  Relative contributions of baseline patient characteristics and the choice of statistical methods to the variability of genotypic resistance scores: the example of didanosine. , 2010, The Journal of antimicrobial chemotherapy.

[28]  Brendan Larder,et al.  Non‐parametric methods to predict HIV drug susceptibility phenotype from genotype , 2003, Statistics in medicine.

[29]  Achim Tresch,et al.  Learning Monotonic Genotype-Phenotype Maps , 2011, Statistical applications in genetics and molecular biology.

[30]  Mark J van der Laan,et al.  Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics , 2004, Statistical applications in genetics and molecular biology.

[31]  W. Marsden I and J , 2012 .

[32]  Matthew Rabinowitz,et al.  Accurate prediction of HIV-1 drug response from the reverse transcriptase and protease amino acid sequences using sparse models created by convex optimization , 2006, Bioinform..

[33]  Sorin Draghici,et al.  Predicting HIV drug resistance with neural networks , 2003, Bioinform..

[34]  B. J. Betts,et al.  HIV-1 protease and reverse transcriptase mutation patterns responsible for discordances between genotypic drug resistance interpretation algorithms. , 2003, Journal of acquired immune deficiency syndromes.

[35]  Matthew J. Gonzales,et al.  Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database: an expanded data model integrating natural language text and sequence analysis programs , 2001, Nucleic Acids Res..

[36]  T. Perneger,et al.  Impact of drug resistance mutations on virologic response to salvage therapy. Swiss HIV Cohort Study. , 1999, AIDS.

[37]  S. Dudoit,et al.  Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples , 2003 .

[38]  D. Descamps,et al.  On the use of magnitude of reduction in HIV-1 RNA in clinical trials: statistical analysis and potential biases. , 2002, Journal of acquired immune deficiency syndromes.

[39]  R. Stephenson A and V , 1962, The British journal of ophthalmology.