Learning Multiple Evolutionary Pathways from Cross-Sectional Data

We introduce a mixture model of trees to describe evolutionary processes that are characterized by the ordered accumulation of permanent genetic changes. The basic building block of the model is a directed weighted tree that generates a probability distribution on the set of all patterns of genetic events. We present an EM-like algorithm for learning a mixture model of K trees and show how to determine K with a maximum likelihood approach. As a case study, we consider the accumulation of mutations in the HIV-1 reverse transcriptase that are associated with drug resistance. The fitted model is statistically validated as a density estimator, and the stability of the model topology is analyzed. We obtain a generative probabilistic model for the development of drug resistance in HIV that agrees with biological knowledge. Further applications and extensions of the model are discussed.

[1]  A. Schäffer,et al.  Graph models of oncogenesis with an application to melanoma. , 2001, Journal of theoretical biology.

[2]  S D Kemp,et al.  Multiple mutations in HIV-1 reverse transcriptase confer high-level resistance to zidovudine (AZT). , 1989, Science.

[3]  Thomas D. Wu,et al.  Mutation Patterns and Structural Correlates in Human Immunodeficiency Virus Type 1 Protease following Different Protease Inhibitor Treatments , 2003, Journal of Virology.

[4]  B. Gunawan,et al.  Maximum likelihood estimation of oncogenetic tree models. , 2004, Biostatistics.

[5]  Caroline A Sabin,et al.  Theoretical rationale for the use of sequential single-drug antiretroviral therapy for treatment of HIV infection , 2003, AIDS.

[6]  B. Larder,et al.  Interactions between drug resistance mutations in human immunodeficiency virus type 1 reverse transcriptase. , 1994, The Journal of general virology.

[7]  D. Richman,et al.  HIV with reduced sensitivity to zidovudine (AZT) isolated during prolonged therapy. , 1989, Science.

[8]  Rami Kantor,et al.  The Genetic Basis of HIV-1 Resistance to Reverse Transcriptase and Protease Inhibitors. , 2000, AIDS reviews.

[9]  Thomas Lengauer,et al.  Methods for optimizing antiviral combination therapies , 2003, ISMB.

[10]  Thomas Lengauer,et al.  Geno2pheno: Interpreting Genotypic HIV Drug Resistance Tests , 2001, IEEE Intell. Syst..

[11]  B. Berkhout,et al.  Evolution of AZT resistance in HIV-1: the 41-70 intermediate that is not observed in vivo has a replication defect. , 2001, Virology.

[12]  B. Larder,et al.  Ordered appearance of zidovudine resistance mutations during treatment of 18 human immunodeficiency virus-positive subjects. , 1992, The Journal of infectious diseases.

[13]  Thomas Lengauer,et al.  Diversity and complexity of HIV-1 drug resistance: A bioinformatics approach to predicting phenotype from genotype , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  A. Schäffer,et al.  Chromosome abnormalities in ovarian adenocarcinoma: III. Using breakpoint data to infer and test mathematical models for oncogenesis , 2000, Genes, chromosomes & cancer.

[15]  Richard M. Karp,et al.  A simple derivation of Edmonds' algorithm for optimum branchings , 1971, Networks.

[16]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[17]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[18]  J. Schapiro,et al.  Methods for investigation of the relationship between drug-susceptibility phenotype and human immunodeficiency virus type 1 genotype with applications to AIDS clinical trials group 333. , 2000, The Journal of infectious diseases.

[19]  Bryan Chan,et al.  Human immunodeficiency virus reverse transcriptase and protease sequence database , 2003, Nucleic Acids Res..

[20]  Thomas D. Wu,et al.  Extended spectrum of HIV-1 reverse transcriptase mutations in patients receiving multiple nucleoside analog inhibitors , 2003, AIDS.

[21]  A. Telenti,et al.  HIV treatment failure: testing for HIV resistance in clinical practice. , 1998, Science.

[22]  B. Berkhout,et al.  Nucleotide substitution patterns can predict the requirements for drug-resistance of HIV-1 proteins. , 1996, Antiviral research.

[23]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[24]  Barry G. Hall,et al.  Predicting Evolution by In Vitro Evolution Requires Determining Evolutionary Pathways , 2002, Antimicrobial Agents and Chemotherapy.

[25]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[26]  D. Ho,et al.  Ordered accumulation of mutations in HIV protease confers resistance to ritonavir , 1996, Nature Medicine.

[27]  Feng Jiang,et al.  Distance-Based Reconstruction of Tree Models for Oncogenesis , 2000, J. Comput. Biol..

[28]  Feng Jiang,et al.  Inferring Tree Models for Oncogenesis from Comparative Genome Hybridization Data , 1999, J. Comput. Biol..

[29]  Robert E. Tarjan,et al.  Finding optimum branchings , 1977, Networks.

[30]  Richard H. Lathrop,et al.  Combinatorial Optimization in Rapidly Mutating Drug-Resistant Viruses , 1999, J. Comb. Optim..

[31]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.