A novel method to quantify gene set functional association based on gene ontology

Numerous gene sets have been used as molecular signatures for exploring the genetic basis of complex disorders. These gene sets are distinct but related to each other in many cases; therefore, efforts have been made to compare gene sets for studies such as those evaluating the reproducibility of different experiments. Comparison in terms of biological function has been demonstrated to be helpful to biologists. We improved the measurement of semantic similarity to quantify the functional association between gene sets in the context of gene ontology and developed a web toolkit named Gene Set Functional Similarity (GSFS; http://bioinfo.hrbmu.edu.cn/GSFS). Validation based on protein complexes for which the functional associations are known demonstrated that the GSFS scores tend to be correlated with sequence similarity scores and that complexes with high GSFS scores tend to be involved in the same functional catalogue. Compared with the pairwise method and the annotation method, the GSFS shows better discrimination and more accurately reflects the known functional catalogues shared between complexes. Case studies comparing differentially expressed genes of prostate tumour samples from different microarray platforms and identifying coronary heart disease susceptibility pathways revealed that the method could contribute to future studies exploring the molecular basis of complex disorders.

[1]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[2]  Stephen C. Harris,et al.  Rat toxicogenomic study reveals analytical consistency across microarray platforms , 2006, Nature Biotechnology.

[3]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[4]  Reinhard Schneider,et al.  Martini: using literature keywords to compare gene sets , 2009, Nucleic acids research.

[5]  Delphine Pessoa,et al.  CESSM: collaborative evaluation of semantic similarity measures , 2009 .

[6]  Yan Zhou,et al.  Improving detection of differentially expressed gene sets by applying cluster enrichment analysis to Gene Ontology , 2009, BMC Bioinformatics.

[7]  Giuseppe Basso,et al.  Diagnosis and genetic subtypes of leukemia combining gene expression and flow cytometry. , 2007, Blood cells, molecules & diseases.

[8]  D. Latchman,et al.  Role of the JAK-STAT pathway in myocardial injury. , 2007, Trends in molecular medicine.

[9]  Kei-Hoi Cheung,et al.  Handling multiple testing while interpreting microarrays with the Gene Ontology Database , 2004, BMC Bioinformatics.

[10]  Mário J. Silva,et al.  Disjunctive shared information between ontology concepts: application to Gene Ontology , 2011, J. Biomed. Semant..

[11]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[13]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .

[14]  Yibo Wu,et al.  GOSemSim: an R package for measuring semantic similarity among GO terms and gene products , 2010, Bioinform..

[15]  Catia Pesquita,et al.  Application of gene ontology to gene identification. , 2011, Methods in molecular biology.

[16]  T. Celik,et al.  A new frame in thromboembolic cardiovascular disease: Adipocytokine. , 2010, International journal of cardiology.

[17]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[18]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[19]  A. Tedgui,et al.  Cytokines in atherosclerosis: pathogenic and regulatory pathways. , 2006, Physiological reviews.

[20]  D. Irvine,et al.  TRK RECEPTORS: ROLES IN NEURONAL SIGNAL TRANSDUCTION * , 2011 .

[21]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[22]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[23]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[24]  T. Ueland,et al.  Chemokines and Cardiovascular Risk , 2008, Arteriosclerosis, thrombosis, and vascular biology.

[25]  J. Luther,et al.  Neurotrophins and target interactions in the development and regulation of sympathetic neuron electrical and synaptic properties , 2009, Autonomic Neuroscience.

[26]  Lei Guo,et al.  The MicroArray Quality Control (MAQC) Project and Cross-Platform Analysis of Microarray Data , 2011, Handbook of Statistical Bioinformatics.

[27]  M. Satoh,et al.  Role of Toll like receptor signaling pathway in ischemic coronary artery disease. , 2008, Frontiers in bioscience : a journal and virtual library.

[28]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  M. Karin,et al.  Mammalian MAP kinase signalling cascades , 2001, Nature.

[30]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[31]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[32]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[33]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[34]  Catia Pesquita,et al.  Evaluating GO-based Semantic Similarity Measures , 2007 .

[35]  R. Chang,et al.  In vitro and in vivo binding of neuroactive steroids to the sigma‐1 receptor as measured with the positron emission tomography radioligand [18F]FPS , 2007, Synapse.

[36]  Alok J. Saldanha,et al.  Java Treeview - extensible visualization of microarray data , 2004, Bioinform..

[37]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[38]  C. Gieger,et al.  Genomewide association analysis of coronary artery disease. , 2007, The New England journal of medicine.

[39]  M. Cobb,et al.  MAP kinases. , 2001, Chemical reviews.

[40]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[41]  Christina Kendziorski,et al.  Statistical methods for gene set co-expression analysis , 2009, Bioinform..

[42]  Pankaj Agarwal,et al.  A Pathway-Based View of Human Diseases and Disease Relationships , 2009, PloS one.

[43]  E. Arbustini,et al.  Immunological characterization and functional importance of human heart mast cells. , 1995, Immunopharmacology.

[44]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[45]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[46]  Young-Hwa Song,et al.  Downregulation of lipopolysaccharide response in drosophila by negative crosstalk between the AP1 and NF-κB signaling modules , 2005, Nature Immunology.

[47]  E. Schiffrin,et al.  Cardiac type-1 angiotensin II receptor status in deoxycorticosterone acetate-salt hypertension in rats. , 1997, Hypertension.

[48]  M J Davies,et al.  A macro and micro view of coronary vascular insult in ischemic heart disease. , 1990, Circulation.

[49]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[50]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[51]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[53]  Holger Fröhlich,et al.  GOSim – an R-package for computation of information theoretic GO similarities between terms and gene products , 2007, BMC Bioinformatics.