The Mutated Subsequence Problem and Locating Conserved Genes

Motivation: For the purpose of locating conserved genes in a whole genome scale, this paper proposes a new structural optimization problem called the Mutated Subsequence Problem, which gives consideration to possible mutations between two species (in the form of reversals and transpositions) when comparing the genomes. Results: A practical algorithm called mutated subsequence algorithm (MSS) is devised to solve this optimization problem, and it has been evaluated using different pairs of human and mouse chromosomes, and different pairs of virus genomes of Baculoviridae. MSS is found to be effective and efficient; in particular, MSS can reveal >90% of the conserved genes of human and mouse that have been reported in the literature. When compared with existing softwares MUMmer and MaxMinCluster, MSS uncovers 14 and 7% more genes on average , respectively. Furthermore, this paper shows a hybrid approach to integrate MUMmer or MaxMinCluster with MSS, which has better performance and reliability.

[1]  Siu-Ming Yiu,et al.  An efficient algorithm for optimizing whole genome alignment with noise , 2004, Bioinform..

[2]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[3]  Pavel A. Pevzner,et al.  Transforming men into mice: the Nadeau-Taylor chromosomal breakage model revisited , 2003, RECOMB '03.

[4]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[5]  E. Herniou,et al.  Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny , 2001, Journal of Virology.

[6]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[7]  D. Baillie,et al.  WABA success: a tool for sequence comparison between large genomes. , 2000, Genome research.

[8]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[9]  Shietung Peng,et al.  A 2-Approximation Algorithm for Genome Rearrangements by Reversals and Transpositions , 1999, Theor. Comput. Sci..

[10]  Vineet Bafna,et al.  Sorting by Transpositions , 1998, SIAM J. Discret. Math..

[11]  Haim Kaplan,et al.  Faster and simpler algorithm for sorting signed permutations by reversals , 1997, SODA '97.

[12]  Mikkel Thorup,et al.  An O(n log n) algorithm for the maximum agreement subtree problem for binary trees , 1996, SODA '96.

[13]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[14]  Vineet Bafna,et al.  Genome rearrangements and sorting by reversals , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[17]  J. Nadeau,et al.  Lengths of chromosomal segments conserved since divergence of man and mouse. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[18]  S. Ohno,et al.  Ancient Linkage Groups and Frozen Accidents , 1973, Nature.

[19]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[20]  Serge A. Hazout,et al.  A strategy for finding regions of similarity in complete genome sequences , 1998, Bioinform..