The Incompatible Desiderata of Gene Cluster Properties

There is widespread interest in comparative genomics in determining if historically and/or functionally related genes are spatially clustered in the genome, and whether the same sets of genes reappear in clusters in two or more genomes. We formalize and analyze the desirable properties of gene clusters and cluster definitions. Through detailed analysis of two commonly applied types of cluster, r-windows and max-gap, we investigate the extent to which a single definition can embody all of these properties simultaneously. We show that many of the most important properties are difficult to satisfy within the same definition. We also examine whether one commonly assumed property, which we call nestedness, is satisfied by the structures present in real genomic data.

[1]  Jens Stoye,et al.  Algorithms for Finding Gene Clusters , 2001, WABI.

[2]  P. Baldi,et al.  LineUp: statistical detection of chromosomal homology with application to plant comparative genomics. , 2003, Genome research.

[3]  Pavel A Pevzner,et al.  Mammalian phylogenomics comes of age. , 2004, Trends in genetics : TIG.

[4]  K. H. Wolfe,et al.  Molecular evidence for an ancient duplication of the entire yeast genome , 1997, Nature.

[5]  K. H. Wolfe,et al.  Updated map of duplicated regions in the yeast genome. , 1999, Gene.

[6]  Klaas Vandepoele,et al.  The hidden duplication past of Arabidopsis thaliana , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  S. O’Brien,et al.  The promise of comparative genomics in mammals. , 1999, Science.

[8]  Kenneth H. Wolfe,et al.  Gene Duplication and Gene Conversion in the Caenorhabditis elegans Genome , 1999, Journal of Molecular Evolution.

[9]  Y. van de Peer,et al.  Detecting the undetectable: uncovering duplicated segments in Arabidopsis by comparison with rice. , 2002, Trends in genetics : TIG.

[10]  Karsten Hokamp A bioinformatics approach to (intra-) genome comparisons , 2002 .

[11]  A. Hughes,et al.  Gene duplication and the structure of eukaryotic genomes. , 2001, Genome research.

[12]  David Sankoff,et al.  Rearrangements and chromosomal evolution. , 2003, Current opinion in genetics & development.

[13]  M. Nei,et al.  Molecular Evolution and Phylogenetics , 2000 .

[14]  D. Sankoff,et al.  Gene Order Breakpoint Evidence in Animal Mitochondrial Phylogeny , 1999, Journal of Molecular Evolution.

[15]  Jon Kleinberg,et al.  Algorithms for Constructing Comparative Maps , 2000 .

[16]  D. Graur,et al.  Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. , 2004, Trends in genetics : TIG.

[17]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[18]  J. Postlethwait,et al.  Measures of synteny conservation between species pairs. , 2002, Genetics.

[19]  Javier Tamames,et al.  Evolution of gene order conservation in prokaryotes , 2001, Genome Biology.

[20]  Brandon S Gaut,et al.  Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. , 2002, Molecular biology and evolution.

[21]  Todd J. Vision,et al.  Fast identification and statistical evaluation of segmental homologies in comparative maps , 2003, ISMB.

[22]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[23]  David Sankoff,et al.  The Reconstruction of Doubled Genomes , 2003, SIAM J. Comput..

[24]  David Sankoff,et al.  Early eukaryote evolution based on mitochondrial gene order breakpoints , 2000, RECOMB '00.

[25]  David Sankoff,et al.  Conserved segment identification , 1997, RECOMB '97.

[26]  M. Kasahara,et al.  New insights into the genomic organization and origin of the major histocompatibility complex: role of chromosomal (genome) duplication in the emergence of the adaptive immune system. , 2004, Hereditas.

[27]  Tao Jiang,et al.  Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. , 2004, Nucleic acids research.

[28]  D. G. Brown,et al.  The origins of genomic duplications in Arabidopsis. , 2000, Science.

[29]  David Sankoff,et al.  Tests for gene clustering , 2002, RECOMB '02.

[30]  J R Roth,et al.  Selfish operons: horizontal transfer may drive the evolution of gene clusters. , 1996, Genetics.

[31]  B. Snel,et al.  Gene and context: integrative approaches to genome analysis. , 2000, Advances in protein chemistry.

[32]  David Sankoff,et al.  Genome Halving , 1998, CPM.

[33]  R. Knight,et al.  Vertebrate genome evolution: a slow shuffle or a big bang? , 1999, BioEssays : news and reviews in molecular, cellular and developmental biology.

[34]  J. Forejt,et al.  Synteny of orthologous genes conserved in mammals, snake, fly, nematode, and fission yeast , 2001, Mammalian Genome.

[35]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[36]  S. Cannon,et al.  DiagHunter and GenoPix2D: programs for genomic comparisons, large-scale homology discovery and visualization , 2003, Genome Biology.

[37]  J. Raes,et al.  The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice. , 2002, Genome research.

[38]  Gilles Didier,et al.  Common Intervals of Two Sequences , 2003, WABI.

[39]  P A Pevzner,et al.  Genome sequence comparison and scenarios for gene rearrangements: a test case. , 1995, Genomics.

[40]  Peer Bork,et al.  Comparative architectures of mammalian and chicken genomes reveal highly variable rates of genomic rearrangements across different lineages. , 2005, Genome research.

[41]  David Sankoff,et al.  The Statistical Significance of Max-Gap Clusters , 2004, Comparative Genomics.

[42]  Mathieu Raffinot,et al.  Gene teams: a new formalization of gene clusters for comparative genomics , 2003, Comput. Biol. Chem..

[43]  K. Hokamp,et al.  A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. , 2003, Genome research.

[44]  Klaas Vandepoele,et al.  Recent developments in computational approaches for uncovering genomic homology. , 2004, BioEssays : news and reviews in molecular, cellular and developmental biology.

[45]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[46]  AndreR.O. Cavalcanti,et al.  Patterns of Gene Duplication in Saccharomyces cerevisiae and Caenorhabditis elegans , 2002, Journal of Molecular Evolution.

[47]  J. Tamames,et al.  Bringing gene order into bacterial shape. , 2001, Trends in genetics : TIG.

[48]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[49]  Arvind K. Bansal,et al.  An automated comparative analysis of 17 complete microbial genomes , 1999, Bioinform..

[50]  A. Valencia,et al.  Conserved Clusters of Functionally Related Genes in Two Bacterial Genomes , 1997, Journal of Molecular Evolution.

[51]  Pierre Baldi,et al.  Statistical detection of chromosomal homology using shared-gene density alone , 2005, Bioinform..

[52]  Bernard M. E. Moret,et al.  An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae , 2000 .

[53]  E. Fisher,et al.  Paralogy mapping: identification of a region in the human MHC triplicated onto human chromosomes 1 and 9 allows the prediction and isolation of novel PBX and NOTCH loci. , 1996, Genomics.

[54]  Paul W. Goldberg,et al.  The complexity of gene placement , 1999, SODA '99.

[55]  P. Bork,et al.  Measuring genome evolution. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Karsten Hokamp,et al.  Extensive genomic duplication during early chordate evolution , 2002, Nature Genetics.

[57]  Takeaki Uno,et al.  Fast Algorithms to Enumerate All Common Intervals of Two Permutations , 1997, Algorithmica.

[58]  Mathieu Raffinot,et al.  The Algorithmic of Gene Teams , 2002, WABI.

[59]  Steven Salzberg,et al.  DAGchainer: a tool for mining segmental genome duplications and synteny , 2004, Bioinform..

[60]  M. Suyama,et al.  Evolution of prokaryotic gene order: genome rearrangements in closely related species. , 2001, Trends in genetics : TIG.

[61]  E. Koonin,et al.  Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. , 2001, Genome research.

[62]  T Gojobori,et al.  Evolutionary significance of intra-genome duplications on human chromosomes. , 1997, Gene.

[63]  David Sankoff,et al.  The Statistical Analysis of Spatially Clustered Genes under the Maximum Gap Criterion , 2005, J. Comput. Biol..

[64]  Yvan Saeys,et al.  Investigating ancient duplication events in the Arabidopsis genome , 2004, Journal of Structural and Functional Genomics.

[65]  David Sankoff,et al.  Chromosome rearrangements in evolution: From gene order to genome sequence and back , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[66]  A. Kolstø,et al.  Dynamic bacterial genome organization , 1997, Molecular microbiology.

[67]  Aoife Mc Lysaght Genomic Features in the Breakpoint Regions between Syntenic Blocks , 2004 .

[68]  David Sankoff,et al.  Chloroplast Gene Order and the Divergence of Plants and Algae, from the Normalized Number of Induced Breakpoints , 2000 .

[69]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[70]  P. Pevzner,et al.  Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. , 2003, Genome research.