Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

[1]  I. Good,et al.  Mathematical Theory of Probability and Statistics , 1966 .

[2]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[3]  K. Hagino-Yamagishi,et al.  [Oncogene]. , 2019, Gan to kagaku ryoho. Cancer & chemotherapy.

[4]  C. Geary Leukemia , 1984, British Journal of Cancer.

[5]  AC Tose Cell , 1993, Cell.

[6]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[7]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[8]  N. Zhao,et al.  Molecular delineation of the smallest commonly deleted region of chromosome 5 in malignant myeloid diseases to 1-1.5 Mb and preparation of a PAC-based physical map. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[9]  H. Willard,et al.  A first-generation X-inactivation profile of the human X chromosome. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  K. Tanaka,et al.  Frequent allelic loss of the RB, D13S319 and D13S25 locus in myeloid malignancies with deletion/translocation at 13q14 of chromosome 13, but not in lymphoid malignancies , 1999, Leukemia.

[11]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[12]  D. Botstein,et al.  Diversity of gene expression in adenocarcinoma of the lung , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Y. Benjamini,et al.  Controlling the false discovery rate in behavior genetics research , 2001, Behavioural Brain Research.

[15]  Dmitri A. Nusinow,et al.  Xist RNA and the mechanism of X chromosome inactivation. , 2002, Annual review of genetics.

[16]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[17]  T. Golub,et al.  The Immunosuppressant Rapamycin Mimics a Starvation-Like Signal Distinct from Amino Acid and Glucose Deprivation , 2002, Molecular and Cellular Biology.

[18]  C. Disteche,et al.  Escape from X inactivation , 2003, Cytogenetic and Genome Research.

[19]  Steven C. Lawlor,et al.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data , 2003, Genome Biology.

[20]  David I. Smith,et al.  Cloning and characterization of the common fragile site FRA6F harboring a replicative senescence gene and frequently deleted in human tumors , 2002, Oncogene.

[21]  C. Harris,et al.  The IARC TP53 database: New online mutation analysis and recommendations to users , 2002, Human mutation.

[22]  T. Acker,et al.  A role for hypoxia and hypoxia-inducible transcription factors in tumor physiology , 2002, Journal of Molecular Medicine.

[23]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[24]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[25]  James M. Roberts,et al.  Telomerase modulates expression of growth-controlling genes and enhances cell proliferation , 2003, Nature Cell Biology.

[26]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[27]  A. Evsikov,et al.  Comment on " 'Stemness': Transcriptional Profiling of Embryonic and Adult Stem Cells" and "A Stem Cell Molecular Signature" (II) , 2003, Science.

[28]  T. Golub,et al.  A Mechanism of Cyclin D1 Action Encoded in the Patterns of Gene Expression in Human Cancer , 2003, Cell.

[29]  Philip M. Long,et al.  Comment on " 'Stemness': Transcriptional Profiling of Embryonic and Adult Stem Cells" and "A Stem Cell Molecular Signature" (I) , 2003, Science.

[30]  A. Barbouti,et al.  A novel gene, MSI2, encoding a putative RNA-binding protein is recurrently rearranged at disease progression of chronic myeloid leukemia and forms a fusion gene with HOXA9 as a result of the cryptic t(7;17)(p15;q23). , 2002, Cancer research.

[31]  A. Butte,et al.  Coordinated reduction of genes of oxidative metabolism in humans with insulin resistance and diabetes: Potential role of PGC1 and NRF1 , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[32]  D. Boffa,et al.  Rapamycin Inhibits the Growth and Metastatic Progression of Non-Small Cell Lung Cancer , 2004, Clinical Cancer Research.

[33]  T. Golub,et al.  mTOR inhibition reverses Akt-dependent prostate intraepithelial neoplasia through regulation of apoptotic and HIF-1-dependent pathways , 2004, Nature Medicine.

[34]  C. Bloomfield,et al.  Cytogenetics in acute leukemia. , 2004, Blood reviews.

[35]  K. Petersen,et al.  Impaired mitochondrial activity in the insulin-resistant offspring of patients with type 2 diabetes. , 2004, The New England journal of medicine.

[36]  W. Wong,et al.  GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in Gene Ontology space. , 2004, Applied bioinformatics.

[37]  T. Golub,et al.  Molecular profiling of diffuse large B-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. , 2004, Blood.

[38]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[39]  J. Mesirov,et al.  An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis , 2005, Nature Genetics.