Bayesian inference of viral recombination : topology distance between DNA segments and its distribution

The phylogenetic inference is the problem of reconstructing the ancestrality betweeen a group of DNA or protein sequences, and is classically represented by a phylogenetic tree. These sequences may represent different species, or different genes from a same species (or both), and the underlying assumption is they share a common ancestral. To achieve consistency the certainty that we approach the true phylogeny as more data becomes available we would like to collect and analyze large genomic sequences. The complication is that besides the natural limitation of the genome sizes, organisms can exchange material between themselves, rendering the topological interpretation innacurate. One example of such an exchange is recombination. In HIV-1, the reverse transcriptase switches RNA templates on average 3 times per replication cycle, yielding an average of about one recombinational strand transfer event per 3000 base pairs. A similar rate is also found in HIV-2 and murine leukemia viruses. Recombination also have been found to play a role in severe acute respiratory syndrome coronaviruses, hepatitis, enteroviruses and other primate lentiviruses. Recombinations lead to emergence of the resistant mutants to multiple drugs and may increase the chance that mutant-free individuals arise among the population of individuals with deleterious mutant genes. Reassortment is a similar type of genetic exchange in RNA viruses, where whole RNA molecules constituents of the segmented viral genome are swapped between individuals, and are responsible for antigenic shift in influenza A viruses.