Molecular Evolution and Phylogenetics

Evolution by natural selection is responsible for the divergence of species populations through three primary mechanisms: populations being altered over evolutionary time and speciating into separate branches, hybridization of two previously distinct species into one, or termination by extinction. Given the vastness of time elapsed since life first emerged on this planet, many distinct species have evolved which are all related to one another; phylogenetics is the study of evolutionary relatedness among species and populations. Traditional phylogeny asks how do species evolve and before the advent of genomic data mostly relied on physiological data (bone structure from fossils, etc). We are interested in tackling phylogenetics from a different perspective; analyzing DNA sequence data in order to determine relationships between and among species. At the core, we would like to detect evidence of natural selection in populations. This is an increasingly important area of research in computational biology and is starting to find commercial applications in the realm of personal genomics: it was recently announced that a joint MIT & Harvard affiliated company was established to sequence individual genomes for $5000 (other private companies including “23 and me,” “deCODEme,” are already doing this. We will formulate this biological problem in computational terms by studying two probabilistic models of divergence: Jukes-Cantor & Kimura. Two purely algorithmic approaches (UPGMA & Neighbor-Joining) will be introduced to build species or gene trees from these relatedness data (the distinction between the species & gene trees is explained below). Among the many open problems in phylogenetics that we can currently address with genomics are how similar two species are, what migration paths early humans took when they first left the African continent by studying variations in identical genes of a number of local tribes around the world (The National Genographic Project is one such example), and determining our closest living cousins (chimpanzees or gorillas?), among many others. Many open questions in evolutionary biology have already been answered by genomnic phylogenetics (a major recent one being the revelation that the closest living relative of the whale is the hippopotamus).