Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs

MOTIVATION The identification of orthologous gene pairs is generally based on sequence similarity. Gene pairs that are mutually 'best hits' between the genomes being compared are asserted to be orthologs. Although this method identifies most orthologous gene pairs with high confidence, it will miss a fraction of them, especially genes in duplicated gene families. In addition, the approach depends heavily on the completeness and quality of gene annotation. When the gene sequences are not correctly represented the approach is unlikely to find the correct ortholog. To overcome these limitations, we have developed an approach to identify orthologous gene pairs using shared chromosomal synteny and the annotation of protein function. RESULTS Assembled mouse and human genomes were used to identify the regions of conserved synteny between these genomes. 'Syntenic anchors' are conserved non-repetitive locations between mouse and human genomes. Using these anchors, we identified blocks of sequences that contain consistently ordered anchors between the two genomes (syntenic blocks). The synteny information has been used to help us identify orthologous gene pairs between mouse and human genomes. The approach combines the mutual selection of the best tBlastX hits between human and mouse transcripts, and inferring gene orthologous relationships based on sharing syntenic anchors, collocating in the same syntenic blocks and sharing the same annotated protein function. Using this approach, we were able to find 19,357 orthologous gene pairs between human and mouse genomes, a 20% increase in the number of orthologs identified by conventional approaches.

[1]  Jian Wang,et al.  The Genome Sequence of the Malaria Mosquito Anopheles gambiae , 2002, Science.

[2]  R. Jensen Orthologs and paralogs - we need to get it right , 2001, Genome Biology.

[3]  M. Campbell,et al.  PANTHER: a library of protein families and subfamilies indexed by function. , 2003, Genome research.

[4]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[5]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[6]  Jody A. Vandergriff,et al.  Erratum: PANTHER: A browsable database of gene products organized by biological function, using curated protein family and subfamily classification (Nucleic Acids Research (2003) vol. 31 (334-341)) , 2003 .

[7]  Alistair G. Rust,et al.  Ensembl 2002: accommodating comparative genomics , 2003, Nucleic Acids Res..

[8]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[9]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[10]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[11]  M S Boguski,et al.  Human and nematode orthologs--lessons from the analysis of 1800 human genes and the proteome of Caenorhabditis elegans. , 1999, Gene.

[12]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[13]  M. Boguski,et al.  Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Peer Bork,et al.  Comparative Genome and Proteome Analysis of Anopheles gambiae and Drosophila melanogaster , 2002, Science.

[15]  B. Trask,et al.  Genomic analysis of orthologous mouse and human olfactory receptor loci , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[17]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[18]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[19]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[20]  Paul Richardson,et al.  The Draft Genome of Ciona intestinalis: Insights into Chordate and Vertebrate Origins , 2002, Science.

[21]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[22]  Anushya Muruganujan,et al.  PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification , 2003, Nucleic Acids Res..

[23]  J. Zhang,et al.  Methods for comparing a DNA sequence with a protein sequence , 1996, Comput. Appl. Biosci..

[24]  Eugene V Koonin,et al.  An apology for orthologs - or brave new memes , 2001, Genome Biology.

[25]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[26]  S. O’Brien,et al.  The promise of comparative genomics in mammals. , 1999, Science.

[27]  R. Durbin,et al.  The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics , 2003, PLoS biology.

[28]  D. Haussler,et al.  Article Identification and Characterization of Multi-Species Conserved Sequences , 2022 .

[29]  Sridhar Hannenhalli,et al.  Enrichment of regulatory signals in conserved non-coding genomic sequence , 2001, Bioinform..

[30]  Andrew Smith Genome sequence of the nematode C-elegans: A platform for investigating biology , 1998 .

[31]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[32]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[33]  Sudhir Kumar,et al.  Comparative Genomics in Eukaryotes , 2005 .

[34]  William H. Majoros,et al.  A Comparison of Whole-Genome Shotgun-Derived Mouse Chromosome 16 and the Human Genome , 2002, Science.

[35]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[36]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[37]  G. Pertea,et al.  Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). , 2002, Genome research.