A strategy for evaluating pathway analysis methods

BackgroundResearchers have previously developed a multitude of methods designed to identify biological pathways associated with specific clinical or experimental conditions of interest, with the aim of facilitating biological interpretation of high-throughput data. Before practically applying such pathway analysis (PA) methods, we must first evaluate their performance and reliability, using datasets where the pathways perturbed by the conditions of interest have been well characterized in advance. However, such ‘ground truths’ (or gold standards) are often unavailable. Furthermore, previous evaluation strategies that have focused on defining ‘true answers’ are unable to systematically and objectively assess PA methods under a wide range of conditions.ResultsIn this work, we propose a novel strategy for evaluating PA methods independently of any gold standard, either established or assumed. The strategy involves the use of two mutually complementary metrics, recall and discrimination. Recall measures the consistency of the perturbed pathways identified by applying a particular analysis method to an original large dataset and those identified by the same method to a sub-dataset of the original dataset. In contrast, discrimination measures specificity—the degree to which the perturbed pathways identified by a particular method to a dataset from one experiment differ from those identifying by the same method to a dataset from a different experiment. We used these metrics and 24 datasets to evaluate six widely used PA methods. The results highlighted the common challenge in reliably identifying significant pathways from small datasets. Importantly, we confirmed the effectiveness of our proposed dual-metric strategy by showing that previous comparative studies corroborate the performance evaluations of the six methods obtained by our strategy.ConclusionsUnlike any previously proposed strategy for evaluating the performance of PA methods, our dual-metric strategy does not rely on any ground truth, either established or assumed, of the pathways perturbed by a specific clinical or experimental condition. As such, our strategy allows researchers to systematically and objectively evaluate pathway analysis methods by employing any number of datasets for a variety of conditions.

[1]  Leming Shi,et al.  Effect of training-sample size and classification difficulty on the accuracy of genomic predictors , 2010, Breast Cancer Research.

[2]  Manuel B. Graeber,et al.  PGC-1α, A Potential Therapeutic Target for Early Intervention in Parkinson’s Disease , 2010, Science Translational Medicine.

[3]  Roberto Romero,et al.  A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity , 2013, PloS one.

[4]  Molly S Bray,et al.  Integrative genomic analysis of the human immune response to influenza vaccination , 2013, eLife.

[5]  F. Middleton,et al.  Transcriptional analysis of multiple brain regions in Parkinson's disease supports the involvement of specific protein processing, energy metabolism, and signaling pathways, and suggests novel disease mechanisms , 2005, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[6]  Cristina Mitrea,et al.  Methods and approaches in the topology-based analysis of biological pathways , 2013, Front. Physiol..

[7]  Zhiping Weng,et al.  Gene set enrichment analysis: performance evaluation and usage guidelines , 2012, Briefings Bioinform..

[8]  Adam M. Gustafson,et al.  Airway PI3K Pathway Activation Is an Early and Reversible Event in Lung Cancer Development , 2010, Science Translational Medicine.

[9]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[10]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  A. Nobel,et al.  Heading Down the Wrong Pathway: on the Influence of Correlation within Gene Sets , 2010, BMC Genomics.

[12]  Jaques Reifman,et al.  Systems biology approaches for discovering biomarkers for traumatic brain injury. , 2013, Journal of neurotrauma.

[13]  Winnie S. Liang,et al.  Alzheimer's disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons , 2008, Proceedings of the National Academy of Sciences.

[14]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[15]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[16]  Jaques Reifman,et al.  PathNet: a tool for pathway analysis using topological information , 2012, Source Code for Biology and Medicine.

[17]  Qi Liu,et al.  BMC Bioinformatics BioMed Central Methodology article Comparative evaluation of gene-set analysis methods , 2007 .

[18]  Henryk Maciejewski,et al.  Gene set analysis methods: statistical models and methodological differences , 2013, Briefings Bioinform..

[19]  M. Cotreau,et al.  Molecular classification of Crohn's disease and ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells. , 2006, The Journal of molecular diagnostics : JMD.

[20]  Lincoln Stein,et al.  Reactome pathway analysis to enrich biological discovery in proteomics data sets , 2011, Proteomics.

[21]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[22]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[23]  John D. Storey,et al.  A genomic storm in critically injured humans , 2011, The Journal of experimental medicine.

[24]  Maqc Consortium The MicroArray Quality Control ( MAQC )-II study of common practices for the development and validation of microarray-based predictive models , 2012 .

[25]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[26]  P. Sebastiani,et al.  Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2007, Nature Medicine.

[27]  Annarita D'Addabbo,et al.  Comparative study of gene set enrichment methods , 2009, BMC Bioinformatics.

[28]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[29]  R. Gamelli,et al.  Genomic responses in mouse models poorly mimic human inflammatory diseases , 2013, Proceedings of the National Academy of Sciences.

[30]  Rafael A Irizarry,et al.  Gene set enrichment analysis made simple , 2009, Statistical methods in medical research.

[31]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.