A new and largely unexplored area of computational biology is combinatorial algorithms for genome rearrangement. In the course of its evolution, the genome of an organism mutates by processes that can rearrange whole segments of a chromosome in a single event. These rearrangement mechanisms include inversion, transposition, duplication, and translocation, and a basic problem is to determine the minimum number of such events that transform one genome to another. This number is called the rearrangement distance between the two genomes, and gives a lower bound on the number of events that must have occurred since their divergence, assuming evolution proceeds according to the processes of the study. In this paper, we begin the algorithmic study of genome rearrangement by translocation. A translocation exchanges material at the end of two chrome somes within a genome. We model this as a process that exchanges prefixes and suffixes of strings, where each string represents a sequence of distinct markers along a chromosome in the genome. For the general problem of determining the translocation distance between two such sets of strings, we present a 2-approximation algorithm. For a theoretical model in which the exchanged sub&rings are of equal length, we derive an optimal algorithm for translocation distance. We also examine for the first time two types of rearrangements in concert. An inversion reverses the order of markers within a substring, and flips the ‘work carried out while the authors were at the Department of Computer Science of the University of California, Davis. iDepartment of Computer Science, The University of Georgia, Athens, GA 30602. Electronic mail: kecoQcs.uga.sdu. This research was supported by a DOE Human Genome Distinguished Postdoctoral Fellowship. tDepartment of Computer Science, Princeton University, Princeton, NJ 08544. Electronic mail: ravi@cs .prinoeton. l du. Research supported by DOE Grant DEFG03-90ER60999 and a DIMACS Postdoctoral Fellowship. orientation of the markers. For genomes that have evolved by translocation and inversion, we show there is a simple 2-approximation algorithm for data in which the orientation of markers is unknown, and a $approximation algorithm when orientation is known. These results take a step towards extending the area from the analysis of simple organisms, whose genomes consist of a single chromosome, and whose evolution has largely involved a single type of rearrangement event, to the analysis of organisms such as man and mouse, whose genomes contain many chromosomes, and whose history since divergence has largely consisted of inversion and translocation events.
[1]
E. Lander,et al.
A genetic linkage map of the mouse: current applications and future prospects.
,
1993,
Science.
[2]
Mark Jerrum,et al.
The Complexity of Finding Minimum-Length Generator Sequences
,
1985,
Theor. Comput. Sci..
[3]
D. Sankoff,et al.
Gene order comparisons for phylogenetic inference: evolution of the mitochondrial genome.
,
1992,
Proceedings of the National Academy of Sciences of the United States of America.
[4]
David Sankoff,et al.
Efficient Bounds for Oriented Chromosome Inversion Distance
,
1994,
CPM.
[5]
Nancy A. Jenkins,et al.
Anchored reference loci for comparative genome mapping in mammals
,
1993,
Nature Genetics.
[6]
David Sankoff,et al.
Edit Distances for Genome Comparisons Based on Non-Local Operations
,
1992,
CPM.
[7]
J. Nadeau,et al.
Lengths of chromosomal segments conserved since divergence of man and mouse.
,
1984,
Proceedings of the National Academy of Sciences of the United States of America.
[8]
P. Pevzner,et al.
Sorting by Reversals: Genome Rearrangements in Plant Organelles and Evolutionary History of X Chromosome
,
1995
.
[9]
Vineet Bafna,et al.
Genome rearrangements and sorting by reversals
,
1993,
Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.