The Duplicated Genes Database: Identification and Functional Annotation of Co-Localised Duplicated Genes across Genomes

Background There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The ‘Duplicated Genes Database’ (DGD) was developed for this purpose. Methodology Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available. Conclusions The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.

[1]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[2]  Rachael P. Huntley,et al.  The UniProt-GO Annotation database in 2011 , 2011, Nucleic Acids Res..

[3]  Monya Baker,et al.  Genomics: Genomes in three dimensions , 2011, Nature.

[4]  Dennis B. Troup,et al.  NCBI GEO: archive for functional genomics data sets—10 years on , 2010, Nucleic Acids Res..

[5]  Domènec Farré,et al.  Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. , 2010, Molecular biology and evolution.

[6]  Sriram Krishnan,et al.  Design and Evaluation of Opal2: A Toolkit for Scientific Software as a Service , 2009, 2009 Congress on Services - I.

[7]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[8]  G. Blobel,et al.  Chromatin loops in gene regulation. , 2009, Biochimica et biophysica acta.

[9]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[10]  Yen Kaow Ng,et al.  Positive correlation between gene coexpression and positional clustering in the zebrafish genome , 2009, BMC Genomics.

[11]  Peter R Cook,et al.  The role of specialized transcription factories in chromosome pairing. , 2008, Biochimica et biophysica acta.

[12]  Véronique Martin,et al.  BioMAJ: a flexible framework for databanks synchronization and processing , 2008, Bioinform..

[13]  Peter R. Cook,et al.  Similar active genes cluster in specialized transcription factories , 2008, The Journal of cell biology.

[14]  D. Vitkup,et al.  Role of Duplicate Genes in Robustness against Deleterious Human Mutations , 2008, PLoS genetics.

[15]  T. Vision,et al.  Divergence in expression between duplicated genes in Arabidopsis. , 2007, Molecular biology and evolution.

[16]  Wolfgang Huber,et al.  Genomic organization of transcriptomes in mammals: Coregulation and cofunctionality. , 2007, Genomics.

[17]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[18]  M. Nei,et al.  Evolutionary dynamics of olfactory and other chemosensory receptor genes in vertebrates , 2006, Journal of Human Genetics.

[19]  Anton Nekrutenko,et al.  Rapid and asymmetric divergence of duplicate genes in the human gene coexpression network , 2006, BMC Bioinformatics.

[20]  X. Gu,et al.  Expression divergence between duplicate genes. , 2005, Trends in genetics : TIG.

[21]  Jan H. Vogel,et al.  Chromosomal clustering of a human transcriptome reveals regulatory background , 2005, BMC Bioinformatics.

[22]  Jan-Peter Nap,et al.  Local Coexpression Domains of Two to Four Genes in the Genome of Arabidopsis1[w] , 2005, Plant Physiology.

[23]  Louxin Zhang,et al.  Genome-scale analysis of positional clustering of mouse testis-specific genes , 2005, BMC Genomics.

[24]  K. H. Wolfe,et al.  Divergence of spatial gene expression profiles following species-specific gene duplications in human and mouse. , 2004, Genome Research.

[25]  Xun Gu,et al.  How much expression divergence after yeast gene duplication could be explained by regulatory motif evolution? , 2004, Trends in genetics : TIG.

[26]  Cristian I. Castillo-Davis,et al.  cis-Regulatory and protein evolution in orthologous and duplicate genes. , 2004, Genome research.

[27]  Stephen W Scherer,et al.  Functional and chromosomal clustering of genes responsive to 5-bromodeoxyuridine in human cells , 2004, Experimental Gerontology.

[28]  Guillaume Blanc,et al.  Functional Divergence of Duplicated Genes Formed by Polyploidy during Arabidopsis Evolution , 2004, The Plant Cell Online.

[29]  E. J. Williams,et al.  Coexpression of neighboring genes in the genome of Arabidopsis thaliana. , 2004, Genome research.

[30]  Scott A. Rifkin,et al.  Duplicate genes increase gene expression diversity within and between species , 2004, Nature Genetics.

[31]  I. Kohane,et al.  Inter-species differences of co-expression of neighboring genes in eukaryotic genomes , 2004, BMC Genomics.

[32]  Z. Gu,et al.  Different evolutionary patterns between young duplicate genes in the human genome , 2003, Genome Biology.

[33]  Wen-Hsiung Li,et al.  Divergence in the spatial pattern of gene expression between human duplicate genes. , 2003, Genome research.

[34]  Jianzhi Zhang Evolution by gene duplication: an update , 2003 .

[35]  Irene I. L. Hwang,et al.  Chromosomal distribution of the human cardiovascular transcriptome. , 2003, Genomics.

[36]  Stanley N Cohen,et al.  Senescence-specific gene expression fingerprints reveal cell-type-dependent physical clustering of up-regulated chromosomal loci , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Thomas Blumenthal,et al.  Coexpression of neighboring genes in Caenorhabditis elegans is mostly due to operons and duplicate genes. , 2003, Genome research.

[38]  Ronald W. Davis,et al.  Role of duplicate genes in genetic robustness against null mutations , 2003, Nature.

[39]  D. Nicolae,et al.  Rapid divergence in expression between duplicate genes inferred from microarray data. , 2002, Trends in genetics : TIG.

[40]  S. Blair Hedges,et al.  The origin and evolution of model organisms , 2002, Nature Reviews Genetics.

[41]  Martin J. Lercher,et al.  Clustering of housekeeping genes provides a unified model of gene order in the human genome , 2002, Nature Genetics.

[42]  John Shawe-Taylor,et al.  Wanda: a database of duplicated fish genes , 2002, Nucleic Acids Res..

[43]  Tadao Serikawa,et al.  Chromosomal assignments of mammalian genes with an acute inflammation-regulated expression in liver , 2001, Immunogenetics.

[44]  A. Hughes,et al.  Gene duplication and the structure of eukaryotic genomes. , 2001, Genome research.

[45]  Z. Gu,et al.  Evolutionary analyses of the human genome , 2001, Nature.

[46]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[47]  Stanley Letovsky,et al.  Bioinformatics: Databases and Systems , 2013, Springer US.

[48]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[49]  J. Eppig,et al.  Genome-wide mapping of unselected transcripts from extraembryonic tissue of 7.5-day mouse embryos reveals enrichment in the t-complex and under-representation on the X chromosome. , 1998, Human molecular genetics.

[50]  G. Lanfranchi,et al.  A comprehensive, high-resolution genomic transcript map of human skeletal muscle. , 1998, Genome research.