Prediction of human immunodeficiency virus type 1 drug resistance: Representation of target sequence mutational patterns via an n-grams approach

Antiretroviral medications for treating human immunodeficiency virus type 1 (HTV-1) infection, in particular inhibitors of the HTV-1 protease (PR) and reverse transcriptase (RT) enzymes, are vulnerable to the emergence of target mutations leading to drug resistance. Here we explore the relationship between PR and RT mutational patterns and corresponding changes in susceptibility to each of their eight and 11 inhibitors, respectively, by developing drug-specific predictive models of resistance trained using previously assayed and publicly available in vitro mutant data. For each inhibitor, we present tenfold cross-validation performance measures of both classification as well as regression statistical learning algorithms. Two approaches are analyzed in each case, based on the use of either relative frequencies or counts of n-grams to represent mutant protein sequences as feature vectors. To the best of our knowledge, this is the first reported study on predictive models of HTV-1 PR and RT drug resistance developed by implementing n-grams to generate sequence attributes. Our technique is complementary to other sequence-based approaches and is competitive in performance. In a novel application, we classify every pair of RT inhibitors as either potentially effective as part of a larger drug cocktail or a combination that should not be concomitantly administered, with results that closely mirror available clinical and experimental data.

[1]  Giorgio Palù,et al.  Comparative evaluation of three computerized algorithms for prediction of antiretroviral susceptibility from HIV type 1 genotype. , 2004, The Journal of antimicrobial chemotherapy.

[2]  R. Shafer,et al.  Genotypic predictors of human immunodeficiency virus type 1 drug resistance , 2006, Proceedings of the National Academy of Sciences.

[3]  Thomas Lengauer,et al.  Predicting Response to Antiretroviral Treatment by Machine Learning: The EuResist Project , 2012, Intervirology.

[4]  Amalio Telenti,et al.  Update of the Drug Resistance Mutations in HIV-1: 2005. , 2005, Topics in HIV medicine : a publication of the International AIDS Society, USA.

[5]  F. Heinz,et al.  Comparison of virtual phenotype and HIV‐SEQ program (Stanford) interpretation for predicting drug resistance of HIV strains , 2002, HIV medicine.

[6]  Judith Klein-Seetharaman,et al.  PROTEINS: Structure, Function, and Bioinformatics 58:955–970 (2005) Protein Classification Based on Text Document Classification Techniques , 2022 .

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  S. Katebi,et al.  Protein Superfamily Classification Using Fuzzy Rule-Based Classifier , 2009, IEEE Transactions on NanoBioscience.

[9]  S. Broder,et al.  The development of antiretroviral therapy and its impact on the HIV-1/AIDS pandemic. , 2010, Antiviral research.

[10]  Rolf Kaiser,et al.  Correction: Antiretroviral Therapy Optimisation without Genotype Resistance Testing: A Perspective on Treatment History Based Models , 2011, PLoS ONE.

[11]  Lynne Peeples,et al.  Abacavir-lamivudine versus tenofovir-emtricitabine for initial HIV-1 therapy. , 2009, The New England journal of medicine.

[12]  B. Larder,et al.  Enhanced prediction of lopinavir resistance from genotype by use of artificial neural networks. , 2003, The Journal of infectious diseases.

[13]  S. Mallal,et al.  High sensitivity of human leukocyte antigen-b*5701 as a marker for immunologically confirmed abacavir hypersensitivity in white and black patients. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[14]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[15]  B. Gazzard,et al.  Comparison of first-line antiretroviral therapy with regimens including nevirapine, efavirenz, or both drugs, plus stavudine and lamivudine: a randomised open-label trial, the 2NN Study , 2004, The Lancet.

[16]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[17]  Yao Ju,et al.  HLA-B~*5701 screening for hypersensitivity to abacavir , 2011 .

[18]  Brendan Larder,et al.  Non‐parametric methods to predict HIV drug susceptibility phenotype from genotype , 2003, Statistics in medicine.

[19]  R. Samudrala,et al.  Simple Linear Model Provides Highly Accurate Genotypic Predictions of HIV-1 Drug Resistance , 2003, Antiviral therapy.

[20]  D. Richman,et al.  Update of the drug resistance mutations in HIV-1: December 2010. , 2010, Topics in HIV medicine : a publication of the International AIDS Society, USA.

[21]  Sorin Draghici,et al.  Predicting HIV drug resistance with neural networks , 2003, Bioinform..

[22]  Thomas Lengauer,et al.  Geno2pheno: estimating phenotypic drug resistance from HIV-1 genotypes , 2003, Nucleic Acids Res..

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[25]  Cathy H. Wu,et al.  Motif identification neural design for rapid and sensitive protein family search , 1996, Comput. Appl. Biosci..

[26]  Ivet Bahar,et al.  The relationship between N‐gram patterns and protein secondary structure , 2007, Proteins.

[27]  Kelvin Xi Zhang,et al.  GAIA: a gram-based interaction analysis tool – an approach for identifying interacting domains in yeast , 2009, BMC Bioinformatics.

[28]  Shuigeng Zhou,et al.  Gene ontology-based protein function prediction by using sequence composition information. , 2010, Protein and peptide letters.

[29]  O. Kirk,et al.  Use of nucleoside reverse transcriptase inhibitors and risk of myocardial infarction in HIV-infected patients enrolled in the D:A:D study: a multi-cohort collaboration , 2008, The Lancet.

[30]  Tommy F. Liu,et al.  HIV-1 Protease and reverse-transcriptase mutations: correlations with antiretroviral therapy in subtype B isolates and implications for drug-resistance surveillance. , 2005, The Journal of infectious diseases.

[31]  O. Kirk,et al.  Risk of myocardial infarction in patients with HIV infection exposed to specific individual antiretroviral drugs from the 3 major drug classes: the data collection on adverse events of anti-HIV drugs (D:A:D) study. , 2010, The Journal of infectious diseases.