Down-weighting overlapping genes improves gene set analysis

BackgroundThe identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set.ResultsIn this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method PathwayAnalysis withDown-weighting ofOverlappingGenes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results.ConclusionsPADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Eric M Reiman,et al.  Gene expression profiles in anatomically and functionally distinct regions of the normal aged human brain. , 2007, Physiological genomics.

[3]  N. Gerry,et al.  Previously unidentified changes in renal cell carcinoma gene expression identified by parametric analysis of microarray data , 2003, BMC Cancer.

[4]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[5]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[6]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[8]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[9]  W. Markesbery,et al.  Incipient Alzheimer's disease: Microarray correlation analyses reveal major transcriptional and tumor suppressor responses , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[11]  Rafael Rosell,et al.  Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer , 2011, International journal of cancer.

[12]  K. Ho,et al.  A Susceptibility Gene Set for Early Onset Colorectal Cancer That Integrates Diverse Signaling Pathways: Implication for Tumorigenesis , 2007, Clinical Cancer Research.

[13]  Thomas Lengauer,et al.  Statistical Applications in Genetics and Molecular Biology Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data , 2011 .

[14]  F. Middleton,et al.  Transcriptional analysis of multiple brain regions in Parkinson's disease supports the involvement of specific protein processing, energy metabolism, and signaling pathways, and suggests novel disease mechanisms , 2005, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[15]  Greg Finak,et al.  Regulation of endocytosis via the oxygen-sensing pathway , 2009, Nature Medicine.

[16]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[17]  Michal A. Kurowski,et al.  Transcriptome Profile of Human Colorectal Adenomas , 2007, Molecular Cancer Research.

[18]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[19]  Manuel B. Graeber,et al.  PGC-1α, A Potential Therapeutic Target for Early Intervention in Parkinson’s Disease , 2010, Science Translational Medicine.

[20]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[21]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[22]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[23]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[24]  Geffrey F. Stopper,et al.  Choosing the right path: enhancement of biologically relevant sets of genes or proteins using pathway structure , 2009, Genome Biology.

[25]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Monica Chiogna,et al.  Gene set analysis exploiting the topology of a pathway , 2010, BMC Systems Biology.

[27]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[28]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[29]  C. Croce,et al.  The role of microRNA genes in papillary thyroid carcinoma. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Thomas Downey,et al.  A ‘metastasis-prone’ signature for early-stage mismatch-repair proficient sporadic colorectal cancer patients and its implications for possible therapeutics , 2010, Clinical & Experimental Metastasis.

[31]  Liviu Badea,et al.  Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia. , 2008, Hepato-gastroenterology.

[32]  Robyn L Prueitt,et al.  Tumor immunobiological differences in prostate cancer between African-American and European-American men. , 2008, Cancer research.

[33]  V. Arango,et al.  Using the Gene Ontology for Microarray Data Mining: A Comparison of Methods and Application to Age Effects in Human Prefrontal Cortex , 2004, Neurochemical Research.

[34]  Qi Liu,et al.  Improving gene set analysis of microarray data by SAM-GS , 2007, BMC Bioinformatics.

[35]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[36]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[37]  W. V. van IJcken,et al.  Gene Expression-Based Classification of Non-Small Cell Lung Carcinomas and Survival Prediction , 2010, PloS one.

[38]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[39]  Mauro Delorenzi,et al.  Analysis of potential transcriptomic biomarkers for Huntington's disease in peripheral blood , 2007, Proceedings of the National Academy of Sciences.

[40]  Soheil Meshinchi,et al.  Identification of genes with abnormal expression changes in acute myeloid leukemia , 2008, Genes, chromosomes & cancer.

[41]  Krishna R. Kalari,et al.  FKBP51 affects cancer cell response to chemotherapy by negatively regulating Akt. , 2009, Cancer cell.

[42]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[43]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[44]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[45]  Ulrich Mansmann,et al.  Identification of a common gene expression signature in dilated cardiomyopathy across independent microarray studies. , 2006, Journal of the American College of Cardiology.

[46]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[47]  P. Khatri,et al.  Profiling gene expression using onto-express. , 2002, Genomics.