Data-intensive analysis of HIV mutations

Mutations in HIV patients' reverse transcriptase and protease may be related to drug resistance. There are many issues that make difficult the complete elucidation of the relationship between these mutations and drug resistance, such as cross resistance and the limitations to detect the relevance of resistance. Look up tables and rule-based systems are an attempt to classify sequences and predict treatment failure. However, they depend on the scientific literature and their quality and reliability. Data-intensive analysis of HIV mutation databases may help to corroborate or to improve such knowledge spread in the literature. Pattern recognition algorithms classify data extracting information from different data domain. Clustering and biclustering classification algorithms have been explored to group scientific and business data based on measures of similarities. K-means is a popular algorithm for clustering and Bimax is used with binary data. Considering this scenario, the main contribution of this work is to develop a new methodology based on K-means and Bimax using a binary data representation of reverse transcriptase and protease sequences, in an attempt to get an unsupervised classification of the sequences that may be related to drug resistance. In our work, 14,393 sequences with selected positions of the proteins, known to be related to drug resistance, represented in an 82-dimensional vector space are analyzed by pattern recognition algorithms. The sequences are represented as binary vectors. Suitable visualization of such vectors is produced for medical interpretation and indicates some correspondence to the prediction of drug resistance given by the brazilian look up table, used by brazilian physicians, but that depends on the literature on HIV and it's quality to be created. As a consequence, in this work we describe a methodology based on the application of pattern recognition algorithms using binary data in order to suggest clusters of mutations and their relations with drug resistance using a different cluster visualization scheme.

[1]  Richard H. Lathrop,et al.  Knowledge-Based Avoidance of Drug-Resistant HIV Mutants , 1998, AI Mag..

[2]  T. Silander,et al.  Bayesian network analysis of resistance pathways against HIV-1 protease inhibitors. , 2007, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[3]  F. Ceccherini‐Silberstein,et al.  Characterization of the patterns of drug-resistance mutations in newly diagnosed HIV-1 infected patients naïve to the antiretroviral drugs , 2009, BMC infectious diseases.

[4]  B. Larder,et al.  Mutations in Retroviral Genes Associated with Drug Resistance , 1996 .

[5]  Rami Kantor,et al.  The Genetic Basis of HIV-1 Resistance to Reverse Transcriptase and Protease Inhibitors. , 2000, AIDS reviews.

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  J. Fantini,et al.  Mutation Patterns of the Reverse Transcriptase and Protease Genes in Human Immunodeficiency Virus Type 1-Infected Patients Undergoing Combination Therapy: Survey of 787 Sequences , 1999, Journal of Clinical Microbiology.

[8]  Thomas D. Wu,et al.  Extended spectrum of HIV-1 reverse transcriptase mutations in patients receiving multiple nucleoside analog inhibitors , 2003, AIDS.

[9]  V. Calvez,et al.  Thymidine analogue reverse transcriptase inhibitors resistance mutations profiles and association to other nucleoside reverse transcriptase inhibitors resistance mutations observed in the context of virological failure , 2004, Journal of medical virology.

[10]  Huldrych F Günthard,et al.  2011 update of the drug resistance mutations in HIV-1. , 2011, Topics in antiviral medicine.

[11]  A. Tanuri,et al.  Low accumulation of L90M in protease from subtype F HIV-1 with resistance to protease inhibitors is caused by the L89M polymorphism. , 2005, The Journal of infectious diseases.

[12]  Susan P. Holmes,et al.  A multifaceted analysis of HIV-1 protease multidrug resistance phenotypes , 2011, BMC Bioinformatics.

[13]  Robert W. Shafer,et al.  HIV-1 Antiretroviral Resistance , 2012, Drugs.

[14]  J. Fantini,et al.  Mutation L210W of HIV-1 reverse transcriptase in patients receiving combination therapy. Incidence, association with other mutations, and effects on the structure of mutated reverse transcriptase. , 2000, Journal of biomedical science.

[15]  David Heckerman,et al.  Phylogenetic Dependency Networks: Inferring Patterns of CTL Escape and Codon Covariation in HIV-1 Gag , 2008, PLoS Comput. Biol..

[16]  Thomas D. Wu,et al.  Mutation Patterns and Structural Correlates in Human Immunodeficiency Virus Type 1 Protease following Different Protease Inhibitor Treatments , 2003, Journal of Virology.

[17]  J. Louis,et al.  Structural implications of drug‐resistant mutants of HIV‐1 protease: High‐resolution crystal structures of the mutant protease/substrate analogue complexes , 2001, Proteins.

[18]  Susan P. Holmes,et al.  Constrained patterns of covariation and clustering of HIV-1 non-nucleoside reverse transcriptase inhibitor resistance mutations , 2010, The Journal of antimicrobial chemotherapy.

[19]  Robert W. Shafer,et al.  Human immunodeficiency virus type 1 reverse transcriptase and protease mutation search engine for queries , 2000, Nature Medicine.

[20]  Matthew J. Gonzales,et al.  Distribution of Human Immunodeficiency Virus Type 1 Protease and Reverse Transcriptase Mutation Patterns in 4,183 Persons Undergoing Genotypic Resistance Testing , 2004, Antimicrobial Agents and Chemotherapy.

[21]  Luciano Vieira de Araújo,et al.  HIV drug resistance analysis tool based on process algebra , 2008, SAC '08.

[22]  D. Richman,et al.  Patterns of resistance mutations selected by treatment of human immunodeficiency virus type 1 infection with zidovudine, didanosine, and nevirapine. , 2000, The Journal of infectious diseases.

[23]  Eyke Hüllermeier,et al.  Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction , 2013, Bioinform..

[24]  Thomas Lengauer,et al.  Tenofovir Resistance and Resensitization , 2003, Antimicrobial Agents and Chemotherapy.

[25]  David W. Haas,et al.  HLA-Associated Immune Escape Pathways in HIV-1 Subtype B Gag, Pol and Nef Proteins , 2009, PloS one.

[26]  F. Brun-Vézinet,et al.  A survival method to estimate the time to occurrence of mutations: an application to thymidine analogue mutations in HIV-1-infected patients. , 2004, The Journal of infectious diseases.

[27]  Ying Liu,et al.  Analysis of correlated mutations in HIV-1 protease using spectral clustering , 2008, Bioinform..

[28]  Bryan Chan,et al.  Human immunodeficiency virus reverse transcriptase and protease sequence database , 2003, Nucleic Acids Res..

[29]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[30]  J. Mellors,et al.  Frequent emergence of N348I in HIV-1 subtype C reverse transcriptase with failure of initial therapy reduces susceptibility to reverse-transcriptase inhibitors. , 2012, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[31]  L. M. Mansky,et al.  Retrovirus mutation rates and their role in genetic variation. , 1998, The Journal of general virology.

[32]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[33]  Celia A Schiffer,et al.  Covariation of amino acid positions in HIV-1 protease. , 2003, Virology.

[34]  Lidia Ruiz,et al.  Prevalence of HIV Protease Mutations on Failure of Nelfinavir-Containing HAART: A Retrospective Analysis of Four Clinical Studies and Two Observational Cohorts , 2002, HIV clinical trials.

[35]  Hans-Hermann Bock,et al.  Two-mode clustering methods: astructuredoverview , 2004, Statistical methods in medical research.

[36]  Soo-Yon Rhee,et al.  Non-nucleoside reverse transcriptase inhibitor (NNRTI) cross-resistance: implications for preclinical evaluation of novel NNRTIs and clinical genotypic resistance testing. , 2014, The Journal of antimicrobial chemotherapy.

[37]  J. Molina,et al.  Once-daily atazanavir/ritonavir versus twice-daily lopinavir/ritonavir, each in combination with tenofovir and emtricitabine, for management of antiretroviral-naive HIV-1-infected patients: 48 week efficacy and safety results of the CASTLE study , 2008, The Lancet.

[38]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[39]  Thomas Lengauer,et al.  Characterization of Novel HIV Drug Resistance Mutations Using Clustering, Multidimensional Scaling and SVM-Based Feature Ranking , 2005, PKDD.

[40]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..