Methods and approaches in the topology-based analysis of biological pathways

The goal of pathway analysis is to identify the pathways significantly impacted in a given phenotype. Many current methods are based on algorithms that consider pathways as simple gene lists, dramatically under-utilizing the knowledge that such pathways are meant to capture. During the past few years, a plethora of methods claiming to incorporate various aspects of the pathway topology have been proposed. These topology-based methods, sometimes referred to as “third generation,” have the potential to better model the phenomena described by pathways. Although there is now a large variety of approaches used for this purpose, no review is currently available to offer guidance for potential users and developers. This review covers 22 such topology-based pathway analysis methods published in the last decade. We compare these methods based on: type of pathways analyzed (e.g., signaling or metabolic), input (subset of genes, all genes, fold changes, gene p-values, etc.), mathematical models, pathway scoring approaches, output (one or more pathway scores, p-values, etc.) and implementation (web-based, standalone, etc.). We identify and discuss challenges, arising both in methodology and in pathway representation, including inconsistent terminology, different data formats, lack of meaningful benchmarks, and the lack of tissue and condition specificity.

[1]  R. Steuer Computational approaches to the topology, stability and dynamics of metabolic networks. , 2007, Phytochemistry.

[2]  Frank Emmert-Streib,et al.  Pathway Analysis of Expression Data: Deciphering Functional Building Blocks of Complex Diseases , 2011, PLoS Comput. Biol..

[3]  Sabah Jassim,et al.  A Topology-Based Score for Pathway Enrichment , 2012, J. Comput. Biol..

[4]  Alfonso Valencia,et al.  EnrichNet: network-based gene set enrichment analysis , 2012, Bioinform..

[5]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[6]  Cengizhan Ozturk,et al.  Pathway analysis of high-throughput biological data within a Bayesian network framework , 2011, Bioinform..

[7]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[8]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[9]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[10]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[11]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[12]  Alfonso Valencia,et al.  TopoGSA: network topological gene set analysis , 2010, Bioinform..

[13]  T. Ideker,et al.  A decade of systems biology. , 2010, Annual review of cell and developmental biology.

[14]  Hiroaki Kitano,et al.  The PANTHER database of protein families, subfamilies, functions and pathways , 2004, Nucleic Acids Res..

[15]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[16]  Bozena Kaminska,et al.  Identification of Pathway Deregulation – Gene Expression Based Analysis of Consistent Signal Transduction , 2012, PloS one.

[17]  Monica Chiogna,et al.  Gene set analysis exploiting the topology of a pathway , 2010, BMC Systems Biology.

[18]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[19]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[20]  B. Snel,et al.  STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. , 2000, Nucleic acids research.

[21]  Thomas Lengauer,et al.  Statistical Applications in Genetics and Molecular Biology Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data , 2011 .

[22]  Peter Spirtes,et al.  Directed Cyclic Graphical Representations of Feedback Models , 1995, UAI.

[23]  S. Dudoit,et al.  Gains in Power from Structured Two-Sample Tests of Means on Graphs , 2010, 1009.5173.

[24]  R. Mclean,et al.  A Unified Approach to Mixed Linear Models , 1991 .

[25]  David S. Wishart,et al.  Bioinformatics Applications Note Systems Biology Metpa: a Web-based Metabolomics Tool for Pathway Analysis and Visualization , 2022 .

[26]  D. K. Arrell,et al.  Network Systems Biology for Drug Discovery , 2010, Clinical pharmacology and therapeutics.

[27]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[28]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[29]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[30]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[31]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[32]  Ali Shojaie,et al.  Analysis of Gene Sets Based on the Underlying Regulatory Network , 2009, J. Comput. Biol..

[33]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[34]  K. Shadan,et al.  Available online: , 2012 .

[35]  Ming-Hui Chen,et al.  A Bayesian Approach to Pathway Analysis by Integrating Gene–Gene Functional Directions and Microarray Data , 2012, Statistics in biosciences.

[36]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[37]  Jin Wang,et al.  Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes , 2012, BMC Systems Biology.

[38]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[39]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[40]  Sorin Draghici,et al.  Down-weighting overlapping genes improves gene set analysis , 2012, BMC Bioinformatics.

[41]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[42]  H. Ji,et al.  A network-based gene-weighting approach for pathway analysis , 2011, Cell Research.

[43]  Hagai Bergman,et al.  Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression , 2005, Bioinform..

[44]  Zhiping Weng,et al.  Identification of functional modules that correlate with phenotypic difference: the influence of network topology , 2010, Genome Biology.

[45]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[46]  Sorin Draghici,et al.  Incorporating Gene Significance in the Impact Analysis of Signaling Pathways , 2012, 2012 11th International Conference on Machine Learning and Applications.

[47]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[48]  Xujing Wang,et al.  TAPPA: topological analysis of pathway phenotype association , 2007, Bioinform..

[49]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[50]  Alexander R. Pico,et al.  WikiPathways: Pathway Editing for the People , 2008, PLoS biology.

[51]  Kenneth H. Buetow,et al.  Identification of Key Processes Underlying Cancer Phenotypes Using Biologic Pathway Analysis , 2007, PloS one.

[52]  Purvesh Khatri,et al.  Recent additions and improvements to the Onto-Tools , 2005, Nucleic Acids Res..

[53]  Fan Zhang,et al.  HPD: an online integrated human pathway database enabling systems biology studies , 2009, BMC Bioinformatics.

[54]  Prateek Mittal,et al.  The Integration of Biological Pathway Knowledge in Cancer Genomics: A review of existing computational approaches , 2012, IEEE Signal Processing Magazine.

[55]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[56]  S. Deris,et al.  Pathway-Based Microarray Analysis for Defining Statistical Significant Phenotype-Related Pathways: A Review of Common Approaches , 2009, 2009 International Conference on Information Management and Engineering.

[57]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[58]  Purvesh Khatri,et al.  Onto-Tools: new additions and improvements in 2006 , 2007, Nucleic Acids Res..

[59]  N. Mantel Evaluation of survival data and two new rank order statistics arising in its consideration. , 1966, Cancer chemotherapy reports.

[60]  G. Michailidis,et al.  Network Enrichment Analysis in Complex Experiments , 2010, Statistical applications in genetics and molecular biology.

[61]  Jiawei Han,et al.  A Unified Framework for Link Recommendation Using Random Walks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[62]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[63]  Alexander R. Pico,et al.  Finding the Right Questions: Exploratory Pathway Analysis to Enhance Biological Discovery in Large Datasets , 2010, PLoS biology.

[64]  Enrica Calura,et al.  The Biological Connection Markup Language: a SBGN-compliant format for visualization, filtering and analysis of biological pathways , 2011, Bioinform..

[65]  Stanley N Cohen,et al.  Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[67]  A. Abdi,et al.  Fault Diagnosis Engineering in Molecular Signaling Networks: An Overview and Applications in Target Discovery , 2010, Chemistry & biodiversity.

[68]  Jeremy Miller,et al.  Identifying disease-specific genes based on their topological significance in protein networks , 2009, BMC Syst. Biol..

[69]  Michael I. Jordan Graphical Models , 2003 .

[70]  Jun Ma,et al.  THINK Back: KNowledge-based Interpretation of High Throughput data , 2012, BMC Bioinformatics.

[71]  Mario Medvedovic,et al.  LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data , 2009, Bioinform..