Gene Set Analysis: Challenges, Opportunities, and Future Research

Gene set analysis methods are widely used to provide insight into high-throughput gene expression data. There are many gene set analysis methods available. These methods rely on various assumptions and have different requirements, strengths and weaknesses. In this paper, we classify gene set analysis methods based on their components, describe the underlying requirements and assumptions for each class, and provide directions for future research in developing and evaluating gene set analysis methods.

[1]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[2]  Saeed Khalili,et al.  Deciphering crucial genes in coeliac disease by bioinformatics analysis , 2019, Autoimmunity.

[3]  Aedín C. Culhane,et al.  GeneSigDB: a manually curated database and resource for analysis of gene expression signatures , 2011, Nucleic Acids Res..

[4]  Chen-Hsiang Yeang,et al.  MGSEA – a multivariate Gene set enrichment analysis , 2019, BMC Bioinformatics.

[5]  Jin Wang,et al.  Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes , 2012, BMC Systems Biology.

[6]  Sorin Draghici,et al.  Down-weighting overlapping genes improves gene set analysis , 2012, BMC Bioinformatics.

[7]  S. Maxwell,et al.  Issues in the Use and Application of Multiple Regression Analysis , 2000 .

[8]  Mona Singh,et al.  Genome-Wide Detection and Analysis of Multifunctional Genes , 2015, PLoS Comput. Biol..

[9]  Christina Backes,et al.  Computation of significance scores of unweighted Gene Set Enrichment Analyses , 2007, BMC Bioinformatics.

[10]  Mayte Suárez-Fariñas,et al.  Evaluation of the Psoriasis Transcriptome across Different Studies by Gene Set Enrichment Analysis (GSEA) , 2010, PloS one.

[11]  Anthony Kusalik,et al.  Gene Set Overlap: An Impediment to Achieving High Specificity in Over-representation Analysis , 2019, BIOINFORMATICS.

[12]  Andrew H. Beck,et al.  Importance of collection in gene set enrichment analysis of drug response in cancer cell lines , 2014, Scientific Reports.

[13]  Joanna Polanska,et al.  Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms , 2019, Bioinform..

[14]  Sabah Jassim,et al.  A Topology-Based Score for Pathway Enrichment , 2012, J. Comput. Biol..

[15]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[16]  Philip Hahnfeldt,et al.  Transcriptional network governing the angiogenic switch in human pancreatic cancer , 2007, Proceedings of the National Academy of Sciences.

[17]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[18]  Ian McQuillan,et al.  Gene Set Databases: A Fountain of Knowledge or a Siren Call? , 2019, BCB.

[19]  Stanley N Cohen,et al.  Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[22]  M. Mehta,et al.  Mitochondrial complex III is essential for suppressive function of regulatory T cells , 2019, Nature.

[23]  Xujing Wang,et al.  TAPPA: topological analysis of pathway phenotype association , 2007, Bioinform..

[24]  J. Baker,et al.  Gene expression across mammalian organ development , 2019, Nature.

[25]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[26]  Sorin Draghici,et al.  Identifying significantly impacted pathways: a comprehensive review and assessment , 2019, Genome Biology.

[27]  Sorin Drăghici,et al.  Statistics and Data Analysis for Microarrays Using R and Bioconductor , 2016 .

[28]  Tim Beißbarth,et al.  Comparative study on gene set and pathway topology-based enrichment methods , 2015, BMC Bioinformatics.

[29]  Monica Chiogna,et al.  Gene set analysis exploiting the topology of a pathway , 2010, BMC Systems Biology.

[30]  L. Coin,et al.  Genotype-free demultiplexing of pooled single-cell RNA-seq , 2019, Genome Biology.

[31]  A. Nobel,et al.  Heading Down the Wrong Pathway: on the Influence of Correlation within Gene Sets , 2010, BMC Genomics.

[32]  Christina Kendziorski,et al.  Statistical methods for gene set co-expression analysis , 2009, Bioinform..

[33]  Ralf Zimmer,et al.  Toward a gold standard for benchmarking gene set enrichment analysis , 2020, Briefings Bioinform..

[34]  Norbert Perrimon,et al.  An Evolutionarily Conserved uORF Regulates PGC1α and Oxidative Metabolism in Mice, Flies, and Bluefin Tuna. , 2019, Cell metabolism.

[35]  Vassilios Ioannidis,et al.  Avoiding the pitfalls of gene set enrichment analysis with SetRank , 2017, BMC Bioinformatics.

[36]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[37]  Charles A Tilford,et al.  Gene set enrichment analysis. , 2009, Methods in molecular biology.

[38]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[39]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[40]  Jaques Reifman,et al.  PathNet: a tool for pathway analysis using topological information , 2012, Source Code for Biology and Medicine.

[41]  Frank Emmert-Streib,et al.  The Chronic Fatigue Syndrome: A Comparative Pathway Analysis , 2007, J. Comput. Biol..

[42]  Rafael A Irizarry,et al.  Gene set enrichment analysis made simple , 2009, Statistical methods in medical research.

[43]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[44]  Ian McQuillan,et al.  Size matters: how sample size affects the reproducibility and specificity of gene set analysis , 2019, Human Genomics.

[45]  Frank Emmert-Streib,et al.  Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline , 2015, Briefings Bioinform..

[46]  Eva Budinska,et al.  A critical comparison of topology-based pathway analysis methods , 2018, PloS one.

[47]  Ali Shojaie,et al.  Gene set analysis methods: a systematic comparison , 2018, BioData Mining.

[48]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[49]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[50]  Patrik Edén,et al.  Comparing Functional Annotation Analyses with Catmap Comparing Functional Annotation Analyses with Catmap , 2004 .

[51]  Lloyd D. Fisher,et al.  2. Biostatistics: A Methodology for the Health Sciences , 1994 .

[52]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[53]  Khay Guan Yeoh,et al.  AQP5 enriches for stem cells and cancer origins in the distal stomach , 2020, Nature.

[54]  D. Damian,et al.  Statistical concerns about the GSEA procedure , 2004, Nature Genetics.

[55]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[56]  Peter J. Park,et al.  A multivariate approach for integrating genome-wide expression data and biological knowledge , 2006, Bioinform..

[57]  Hiromitsu Araki,et al.  GeneSetDB: A comprehensive meta-database, statistical and visualisation framework for gene set analysis , 2012, FEBS open bio.

[58]  Hagai Bergman,et al.  Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression , 2005, Bioinform..

[59]  Ivo Grosse,et al.  Fold-Change-Specific Enrichment Analysis (FSEA): Quantification of Transcriptional Response Magnitude for Functional Gene Groups , 2020, Genes.

[60]  Henryk Maciejewski,et al.  Gene set analysis methods: statistical models and methodological differences , 2013, Briefings Bioinform..

[61]  Roberto Romero,et al.  A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity , 2013, PloS one.

[62]  Shesh N. Rai,et al.  Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges , 2020, Entropy.

[63]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[64]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[65]  Qi Liu,et al.  BMC Bioinformatics BioMed Central Methodology article Comparative evaluation of gene-set analysis methods , 2007 .

[66]  G. Glazko,et al.  Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential , 2013, Nucleic acids research.

[67]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[68]  W. Wong,et al.  GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in Gene Ontology space. , 2004, Applied bioinformatics.

[69]  Monica Chiogna,et al.  Along signal paths: an empirical gene set approach exploiting pathway topology , 2012, Nucleic acids research.

[70]  Jian Wang,et al.  Elevated HMGA2 expression is associated with cancer aggressiveness and predicts poor outcome in breast cancer. , 2016, Cancer letters.

[71]  Anthony Kusalik,et al.  Measuring consistency among gene set analysis methods: A systematic study , 2019, J. Bioinform. Comput. Biol..

[72]  Mary F. McGuire,et al.  Data driven linear algebraic methods for analysis of molecular pathways: Application to disease progression in shock/trauma , 2012, J. Biomed. Informatics.

[73]  Cristina Mitrea,et al.  Methods and approaches in the topology-based analysis of biological pathways , 2013, Front. Physiol..

[74]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[75]  Zhen Jiang,et al.  Bioconductor Project Bioconductor Project Working Papers Year Paper Extensions to Gene Set Enrichment , 2013 .

[76]  Zhiping Weng,et al.  Gene set enrichment analysis: performance evaluation and usage guidelines , 2012, Briefings Bioinform..

[77]  P. Khatri,et al.  Profiling gene expression using onto-express. , 2002, Genomics.

[78]  C. Bogardus,et al.  Microarray profiling of skeletal muscle tissues from equally obese, non-diabetic insulin-sensitive and insulin-resistant Pima Indians , 2002, Diabetologia.

[79]  B. Fridley,et al.  Self-Contained Gene-Set Analysis of Expression Data: An Evaluation of Existing and Novel Methods , 2010, PloS one.

[80]  K. Nair,et al.  Gene expression profile in skeletal muscle of type 2 diabetes and the effect of insulin treatment. , 2002, Diabetes.

[81]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[82]  Zhiping Weng,et al.  Identification of functional modules that correlate with phenotypic difference: the influence of network topology , 2010, Genome Biology.

[83]  Anthony Kusalik,et al.  A Synthetic Kinome Microarray Data Generator , 2015, Microarrays.

[84]  Joanna Polanska,et al.  Ranking metrics in gene set enrichment analysis: do they matter? , 2017, BMC Bioinformatics.

[85]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..