Of mice and men: algorithms for evolutionary distances between genomes with translocation

A new and largely unexplored area of computational biology is combinatorial algorithms for genome rearrangement. In the course of its evolution, the genome of an organism mutates by processes that can rearrange whole segments of a chromosome in a single event. These rearrangement mechanisms include inversion, transposition, duplication, and translocation, and a basic problem is to determine the minimum number of such events that transform one genome to another. This number is called the rearrangement distance between the two genomes, and gives a lower bound on the number of events that must have occurred since their divergence, assuming evolution proceeds according to the processes of the study. In this paper, we begin the algorithmic study of genome rearrangement by translocation. A translocation exchanges material at the end of two chrome somes within a genome. We model this as a process that exchanges prefixes and suffixes of strings, where each string represents a sequence of distinct markers along a chromosome in the genome. For the general problem of determining the translocation distance between two such sets of strings, we present a 2-approximation algorithm. For a theoretical model in which the exchanged sub&rings are of equal length, we derive an optimal algorithm for translocation distance. We also examine for the first time two types of rearrangements in concert. An inversion reverses the order of markers within a substring, and flips the ‘work carried out while the authors were at the Department of Computer Science of the University of California, Davis. iDepartment of Computer Science, The University of Georgia, Athens, GA 30602. Electronic mail: kecoQcs.uga.sdu. This research was supported by a DOE Human Genome Distinguished Postdoctoral Fellowship. tDepartment of Computer Science, Princeton University, Princeton, NJ 08544. Electronic mail: ravi@cs .prinoeton. l du. Research supported by DOE Grant DEFG03-90ER60999 and a DIMACS Postdoctoral Fellowship. orientation of the markers. For genomes that have evolved by translocation and inversion, we show there is a simple 2-approximation algorithm for data in which the orientation of markers is unknown, and a $approximation algorithm when orientation is known. These results take a step towards extending the area from the analysis of simple organisms, whose genomes consist of a single chromosome, and whose evolution has largely involved a single type of rearrangement event, to the analysis of organisms such as man and mouse, whose genomes contain many chromosomes, and whose history since divergence has largely consisted of inversion and translocation events.