CLOE: Identification of putative functional relationships among genes by comparison of expression profiles between two species

BackgroundPublic repositories of microarray data contain an incredible amount of information that is potentially relevant to explore functional relationships among genes by meta-analysis of expression profiles. However, the widespread use of this resource by the scientific community is at the moment limited by the limited availability of effective tools of analysis. We here describe CLOE, a simple cDNA microarray data mining strategy based on meta-analysis of datasets from pairs of species. The method consists in ranking EST probes in the datasets of the two species according to the similarity of their expression profiles with that of two EST probes from orthologous genes, and extracting orthologous EST pairs from a given top interval of the ranked lists. The Gene Ontology annotation of the obtained candidate partners is then analyzed for keywords overrepresentation.ResultsWe demonstrate the capabilities of the approach by testing its predictive power on three proteomically-defined mammalian protein complexes, in comparison with single and multiple species meta-analysis approaches. Our results show that CLOE can find candidate partners for a greater number of genes, if compared to multiple species co-expression analysis, but retains a comparable specificity even when applied to species as close as mouse and human. On the other hand, it is much more specific than single organisms co-expression analysis, strongly reducing the number of potential candidate partners for a given gene of interest.ConclusionsCLOE represents a simple and effective data mining approach that can be easily used for meta-analysis of cDNA microarray experiments characterized by very heterogeneous coverage. Importantly, it produces for genes of interest an average number of high confidence putative partners that is in the range of standard experimental validation techniques.

[1]  P. Brazhnik,et al.  Gene networks: how to put the function in genomics. , 2002, Trends in biotechnology.

[2]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[3]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[4]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[5]  M. Vidal,et al.  Integrating 'omic' information: a bridge between genomics and systems biology. , 2003, Trends in genetics : TIG.

[6]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[7]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[8]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[9]  Laszlo Prokai,et al.  Proteomic analysis of the synaptic plasma membrane fraction isolated from rat forebrain. , 2003, Brain research. Molecular brain research.

[10]  I. Kohane,et al.  Inter-species differences of co-expression of neighboring genes in eukaryotic genomes , 2004, BMC Genomics.

[11]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[12]  B. De Moor,et al.  Comparison and meta-analysis of microarray data: from the bench to the computer desk. , 2003, Trends in genetics : TIG.

[13]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  G. Casari,et al.  A physical and functional map of the human TNF-alpha/NF-kappa B signal transduction pathway. , 2004, Nature cell biology.

[16]  Eleanor G. Rieffel,et al.  Finding coexpressed genes in counts-based data: an improved measure with validation experiments , 2004, Bioinform..

[17]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[18]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[19]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[20]  Gerald M Rubin,et al.  Evidence for large domains of similarly expressed genes in the Drosophila genome , 2002, Journal of biology.

[21]  Masaru Tomita,et al.  Inferring alternative splicing patterns in mouse from a full-length cDNA library and microarray data. , 2002, Genome research.

[22]  David Botstein,et al.  The Stanford Microarray Database: data access and quality assessment tools , 2003, Nucleic Acids Res..

[23]  M. Mann,et al.  Proteomic characterization of the human centrosome by protein correlation profiling , 2003, Nature.

[24]  Martin Vingron,et al.  Increase of functional diversity by alternative splicing. , 2003, Trends in genetics : TIG.

[25]  Martin Vingron,et al.  Genome wide identification and classification of alternative splicing based on EST data , 2004, Bioinform..

[26]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[27]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..