Identifying Evolutionarily Conserved Segments Among Multiple Divergent and Rearranged Genomes

We describe a new method for reliably identifying conserved segments among genome sequences that have undergone rearrangement, horizontal transfer, and substantial nucleotide-level divergence. A Gibbs-like sampler explores different combinations of sequence-based markers shared by the genomes under study. The sampler assigns each marker a posterior probability based on how frequently it participates in some collinear group of markers. Markers with high p.p. values are likely members of conserved segments. The method identifies both large-scale and local trends in segmental collinearity, providing suitable input for genome alignment and rearrangement history inference tools. Applying our method to genomes of four Streptococci reveals that rearranged segments in these organisms belong in two size categories: large conserved segments that are interrupted by a staccato of single gene or operon-size small segments. The rearrangement pattern of large segments is best explained by symmetric inversions about the origin of replication while the pattern of small segments is not.

[1]  D. Sankoff,et al.  Gene Order Breakpoint Evidence in Animal Mitochondrial Phylogeny , 1999, Journal of Molecular Evolution.

[2]  David Sankoff,et al.  Detection and validation of single gene inversions , 2003, ISMB.

[3]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[4]  Guy Plunkett,et al.  Genome Sequence of Yersinia pestis KIM , 2002, Journal of bacteriology.

[5]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[6]  J. Ferretti,et al.  Comparative genomics of streptococcal species. , 2004, The Indian journal of medical research.

[7]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[8]  Glenn Tesler,et al.  GRIMM: genome rearrangements web server , 2002, Bioinform..

[9]  P. Baldi,et al.  LineUp: statistical detection of chromosomal homology with application to plant comparative genomics. , 2003, Genome research.

[10]  Michael Brudno,et al.  The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences , 2004, Nucleic Acids Res..

[11]  Jeremy Buhler,et al.  Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[12]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[13]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[14]  Bruce A. Roe,et al.  Complete genome sequence of an M1 strain of Streptococcus pyogenes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Aaron E. Darling,et al.  GRIL: genome rearrangement and inversion locator , 2004, Bioinform..

[16]  J. Louarn,et al.  Localized remodeling of the Escherichia coli chromosome: the patchwork of segments refractory and tolerant to inversion near the replication terminus. , 2001, Genetics.

[17]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[18]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[19]  Ian T. Paulsen,et al.  Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  P. Pevzner,et al.  Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. , 2003, Genome research.

[21]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[22]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[23]  E. Koonin,et al.  Evolution of mosaic operons by horizontal gene transfer and gene displacement in situ , 2003, Genome Biology.

[24]  Runying Tian,et al.  Genome sequence of Streptococcus mutans UA159, a cariogenic dental pathogen , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  J. Nadeau,et al.  Lengths of chromosomal segments conserved since divergence of man and mouse. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Todd M. Smith,et al.  Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[27]  J. Roth,et al.  Selection and endpoint distribution of bacterial inversion mutations. , 1983, Genetics.

[28]  Todd J. Vision,et al.  Fast identification and statistical evaluation of segmental homologies in comparative maps , 2003, ISMB.

[29]  Nadia El-Mabrouk,et al.  Exploring the Set of All Minimal Sequences of Reversals - An Application to Test the Replication-Directed Reversal Hypothesis , 2002, WABI.

[30]  S. Salzberg,et al.  Complete Genome Sequence of a Virulent Isolate of Streptococcus pneumoniae , 2001, Science.

[31]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[32]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[33]  Eugene V Koonin,et al.  Connected gene neighborhoods in prokaryotic genomes. , 2002, Nucleic acids research.

[34]  David Sankoff,et al.  Tests for Gene Clustering , 2003, J. Comput. Biol..

[35]  S. Salzberg,et al.  Evidence for symmetric chromosomal inversions around the replication origin in bacteria , 2000, Genome Biology.

[36]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[37]  E. Tillier,et al.  Genome rearrangement by replication-directed translocation , 2000, Nature Genetics.

[38]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.