Accurate and efficient gp120 V3 loop structure based models for the determination of HIV-1 co-receptor usage

BackgroundHIV-1 targets human cells expressing both the CD4 receptor, which binds the viral envelope glycoprotein gp120, as well as either the CCR5 (R5) or CXCR4 (X4) co-receptors, which interact primarily with the third hypervariable loop (V3 loop) of gp120. Determination of HIV-1 affinity for either the R5 or X4 co-receptor on host cells facilitates the inclusion of co-receptor antagonists as a part of patient treatment strategies. A dataset of 1193 distinct gp120 V3 loop peptide sequences (989 R5-utilizing, 204 X4-capable) is utilized to train predictive classifiers based on implementations of random forest, support vector machine, boosted decision tree, and neural network machine learning algorithms. An in silico mutagenesis procedure employing multibody statistical potentials, computational geometry, and threading of variant V3 sequences onto an experimental structure, is used to generate a feature vector representation for each variant whose components measure environmental perturbations at corresponding structural positions.ResultsClassifier performance is evaluated based on stratified 10-fold cross-validation, stratified dataset splits (2/3 training, 1/3 validation), and leave-one-out cross-validation. Best reported values of sensitivity (85%), specificity (100%), and precision (98%) for predicting X4-capable HIV-1 virus, overall accuracy (97%), Matthew's correlation coefficient (89%), balanced error rate (0.08), and ROC area (0.97) all reach critical thresholds, suggesting that the models outperform six other state-of-the-art methods and come closer to competing with phenotype assays.ConclusionsThe trained classifiers provide instantaneous and reliable predictions regarding HIV-1 co-receptor usage, requiring only translated V3 loop genotypes as input. Furthermore, the novelty of these computational mutagenesis based predictor attributes distinguishes the models as orthogonal and complementary to previous methods that utilize sequence, structure, and/or evolutionary information. The classifiers are available online at http://proteins.gmu.edu/automute.

[1]  David A. Price,et al.  Maraviroc (UK-427,857), a Potent, Orally Bioavailable, and Selective Small-Molecule Inhibitor of Chemokine Receptor CCR5 with Broad-Spectrum Anti-Human Immunodeficiency Virus Type 1 Activity , 2005, Antimicrobial Agents and Chemotherapy.

[2]  Andrew J. Low,et al.  Predicting HIV Coreceptor Usage on the Basis of Genetic and Clinical Covariates , 2007, Antiviral therapy.

[3]  J. Sleasman,et al.  Envelope V3 amino acid sequence predicts HIV-1 phenotype (co-receptor usage and tropism for macrophages). , 2000, AIDS.

[4]  H. Schuitemaker,et al.  Biological phenotype of human immunodeficiency virus type 1 clones at different stages of infection: progression of disease is associated with a shift from monocytotropic to T-cell-tropic virus population , 1992, Journal of virology.

[5]  M. Quiñones-Mateu,et al.  Current tests to evaluate HIV-1 coreceptor tropism , 2009, Current opinion in HIV and AIDS.

[6]  Iosif I. Vaisman,et al.  Compositional preferences in quadruplets of nearest neighbor residues in protein structures: statistical geometry analysis , 1998, Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174).

[7]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[8]  M. Jensen,et al.  Predicting HIV-1 coreceptor usage with sequence analysis. , 2003, AIDS reviews.

[9]  Majid Masso,et al.  Comprehensive mutagenesis of HIV-1 protease: a computational geometry approach. , 2003, Biochemical and biophysical research communications.

[10]  B. Korber,et al.  A new classification for HIV-1 , 1998, Nature.

[11]  Iosif I. Vaisman,et al.  A Novel Sequence-Structure Approach for Accurate Prediction of Resistance to HIV-1 Protease Inhibitors , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[12]  Tobias Sing,et al.  Current V3 genotyping algorithms are inadequate for predicting X4 co-receptor usage in clinical isolates , 2007, AIDS.

[13]  Majid Masso,et al.  Computational mutagenesis studies of protein structure‐function correlations , 2006, Proteins.

[14]  Iosif I. Vaisman,et al.  Computational Mutagenesis of E. coliLacRepressor: Insight into Structure-Function Relationships and Accurate Prediction of Mutant Activity , 2008, ISBRA.

[15]  Iosif I. Vaisman,et al.  Accurate prediction of enzyme mutant activity based on a multibody statistical potential , 2007, Bioinform..

[16]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[17]  R. Swanstrom,et al.  Improved success of phenotype prediction of the human immunodeficiency virus type 1 from envelope variable loop 3 sequence using neural networks. , 2001, Virology.

[18]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[19]  Yuntao Wu The co-receptor signaling model of HIV-1 pathogenesis in peripheral CD4 T cells , 2009, Retrovirology.

[20]  Shungao Xu,et al.  Improved prediction of coreceptor usage and phenotype of HIV-1 based on combined features of V3 loop sequence using random forest. , 2007, Journal of microbiology.

[21]  D. Kuritzkes HIV-1 entry inhibitors: an overview , 2009, Current opinion in HIV and AIDS.

[22]  Holger Scheib,et al.  HIV-1 coreceptor selectivity: structural analogy between HIV-1 V3 regions and chemokine beta-hairpins is not the explanation. , 2006, Structure.

[23]  J. Zack,et al.  CD4+ NK cells can be productively infected with HIV, leading to downregulation of CD4 expression and changes in function. , 2009, Virology.

[24]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[25]  Michal Sharon,et al.  Alternative conformations of HIV-1 V3 loops mimic beta hairpins in chemokines, suggesting a mechanism for coreceptor selectivity. , 2003, Structure.

[26]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[27]  Teruaki Watabe,et al.  Fold Recognition of the Human Immunodeficiency Virus Type 1 V3 Loop and Flexibility of Its Crown Structure During the Course of Adaptation to a Host , 2006, Genetics.

[28]  Lynn Morris,et al.  A Reliable Phenotype Predictor for Human Immunodeficiency Virus Type 1 Subtype C Based on Envelope V3 Sequences , 2006, Journal of Virology.

[29]  J. Goudsmit,et al.  Minimal requirements for the human immunodeficiency virus type 1 V3 domain to support the syncytium-inducing phenotype: analysis by single amino acid substitution , 1992, Journal of virology.

[30]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[31]  Steven M. Wolinsky,et al.  The role of a mutant CCR5 allele in HIV–1 transmission and disease progression , 1996, Nature Medicine.

[32]  K. Boulez,et al.  The complete Consensus V3 loop peptide of the envelope protein gp120 of HIV‐1 shows pronounced helical character in solution , 1995, FEBS letters.

[33]  Iosif I. Vaisman,et al.  Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis , 2008, Bioinform..

[34]  G. Ulivi,et al.  Robust supervised and unsupervised statistical learning for HIV type 1 coreceptor usage analysis. , 2009, AIDS research and human retroviruses.

[35]  Jacques Corbeil,et al.  A new perspective on V3 phenotype prediction. , 2003, AIDS research and human retroviruses.

[36]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[37]  J. Albert,et al.  Replicative capacity, cytopathic effect and cell tropism of HIV , 1989, AIDS.

[38]  Thomas Lengauer,et al.  Structural Descriptors of gp120 V3 Loop for the Prediction of HIV-1 Coreceptor Usage , 2007, PLoS Comput. Biol..

[39]  Manfred J. Sippl,et al.  Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures , 1993, J. Comput. Aided Mol. Des..