From genes to functional classes in the study of biological systems

With the popularisation of high-throughput techniques, the need for procedures that help in the biological interpretation of results has increased enormously. Recently, new procedures inspired in systems biology criteria have started to be developed. Here we present FatiScan, a web-based program which implements a threshold-independent test for the functional interpretation of large-scale experiments that does not depend on the pre-selection of genes based on the multiple application of independent tests to each gene. The test implemented aims to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes. In addition, the test does not depend on the type of the data used for obtaining significance values, and consequently different types of biologically informative terms (gene ontology, pathways, functional motifs, transcription factor binding sites or regulatory sites from CisRed) can be applied to different classes of genome-scale studies. We exemplify its application in microarray gene expression, evolution and interactomics. Methods for gene set enrichment which, in addition, are independent from the original data and experimental design constitute a promising alternative for the functional profiling of genome-scale experiments. A web server that performs the test described and other similar ones can be found at: http://www.babelomics.org .

[1]  C. Pál,et al.  The evolutionary dynamics of eukaryotic gene order , 2004, Nature Reviews Genetics.

[2]  Jelle J. Goeman,et al.  Testing association of a pathway with survival using gene expression data , 2005, Bioinform..

[3]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[4]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Hagai Bergman,et al.  Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression , 2005, Bioinform..

[7]  Xuefeng Bruce Ling,et al.  GO-Diff: Mining functional differentiation between EST-based transcriptomes , 2006, BMC Bioinformatics.

[8]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Lambert C. J. Dorssers,et al.  GO-Mapper: functional analysis of gene expression data using the expression level as a score to evaluate Gene Ontology terms , 2004, Bioinform..

[10]  E. Ukkonen,et al.  Genome-wide Prediction of Mammalian Enhancers Based on Analysis of Transcription-Factor Binding Affinity , 2006, Cell.

[11]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[12]  D. Balding,et al.  Handbook of statistical genetics , 2004 .

[13]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[14]  Joaquín Dopazo,et al.  Positive Selection, Relaxation, and Acceleration in the Evolution of the Human and Chimp Genome , 2006, PLoS Comput. Biol..

[15]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[16]  Joaquín Dopazo,et al.  BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments , 2005, Nucleic Acids Res..

[17]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[18]  E. Kunkel Systems biology in drug discovery , 2004, Nature Biotechnology.

[19]  Joaquín Dopazo,et al.  Ontologies and Functional Genomics , 2005, Data Analysis and Visualization in Genomics and Proteomics.

[20]  I. Langner Survival Analysis: Techniques for Censored and Truncated Data , 2006 .

[21]  F. Baas,et al.  The Human Transcriptome Map: Clustering of Highly Expressed Genes in Chromosomal Domains , 2001, Science.

[22]  Baldomero Oliva,et al.  PIANA: protein interactions and network analysis , 2006, Bioinform..

[23]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Baldomero Oliva,et al.  Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships , 2005, Bioinform..

[25]  Obi L. Griffith,et al.  cisRED: a database system for genome-scale computational discovery of regulatory elements , 2005, Nucleic Acids Res..

[26]  E. Wingender,et al.  MATCH: A tool for searching transcription factor binding sites in DNA sequences. , 2003, Nucleic acids research.

[27]  M. Daly,et al.  Guilt by association , 2000, Nature Genetics.

[28]  Joaquín Dopazo,et al.  New Challenges in Gene Expression Data Analysis and the Extended GEPAS , 2004, Spanish Bioinformatics Conference.

[29]  Joaquín Dopazo,et al.  BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments , 2006, Nucleic Acids Res..

[30]  Joaquín Dopazo,et al.  GEPAS, an experiment-oriented pipeline for the analysis of microarray gene expression data , 2005, Nucleic Acids Res..

[31]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[32]  Joaquín Dopazo,et al.  Next station in microarray data analysis: GEPAS , 2006, Nucleic Acids Res..

[33]  T. Liesegang The human transcriptome map: Clustering of highly expressed genes in chromosomal domains. Caron H, ∗ van Schaik B, van der Mee M, et al. Science 2001;291:1289–1292. , 2001 .

[34]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[35]  Kathleen F. Kerr,et al.  Standardizing global gene expression analysis between laboratories and across platforms , 2005, Nature Methods.

[36]  Joaquín Dopazo,et al.  Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information , 2005, Bioinform..

[37]  M. Adams,et al.  Inferring Nonneutral Evolution from Human-Chimp-Mouse Orthologous Gene Trios , 2003, Science.

[38]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[39]  Xin Chen,et al.  TRANSFAC: an integrated system for gene expression regulation , 2000, Nucleic Acids Res..

[40]  Ziheng Yang,et al.  Adaptive Molecular Evolution , 2004, Handbook of Statistical Genomics.

[41]  M. Gerstein,et al.  Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. , 2002, Genome research.

[42]  Timothy B Sackton,et al.  A Scan for Positively Selected Genes in the Genomes of Humans and Chimpanzees , 2005, PLoS biology.

[43]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[44]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[45]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[46]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[47]  Olivier Poch,et al.  Identification of genes associated with tumorigenesis and metastatic potential of hypopharyngeal cancer by microarray analysis , 2004, Oncogene.

[48]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[50]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[51]  J. Dopazo Functional interpretation of microarray experiments. , 2006, Omics : a journal of integrative biology.

[52]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.

[53]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[54]  Alexander E. Kel,et al.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences , 2003, Nucleic Acids Res..

[55]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[56]  D. Damian,et al.  Statistical concerns about the GSEA procedure , 2004, Nature Genetics.

[57]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[58]  Joaquín Dopazo,et al.  GEPAS: a web-based resource for microarray gene expression data analysis , 2003, Nucleic Acids Res..

[59]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[60]  B. Palsson,et al.  The evolution of molecular biology into systems biology , 2004, Nature Biotechnology.

[61]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[62]  Damian Smedley,et al.  Ensembl 2005 , 2004, Nucleic Acids Res..

[63]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[64]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..