Pathway Distiller - multisource biological pathway consolidation

BackgroundOne method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets.MethodsAfter gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment.ResultsWe demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods.ConclusionsBy combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.

[1]  Gautier Koscielny,et al.  Ensembl’s 10th year , 2009, Nucleic Acids Res..

[2]  Renata C. Geer,et al.  The NCBI BioSystems database , 2009, Nucleic Acids Res..

[3]  D. Lane,et al.  Cancer. p53, guardian of the genome. , 1992, Nature.

[4]  Lincoln D Stein,et al.  GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS) , 2010 .

[5]  Yidong Chen,et al.  Multisource biological pathway consolidation , 2011, 2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS).

[6]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[7]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[8]  Fan Zhang,et al.  HPD: an online integrated human pathway database enabling systems biology studies , 2009, BMC Bioinformatics.

[9]  Xin He,et al.  Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model , 2009, BMC Bioinformatics.

[10]  Jianhua Ruan,et al.  A Network of Conserved Damage Survival Pathways Revealed by a Genomic RNAi Screen , 2009, PLoS genetics.

[11]  Jianhua Ruan,et al.  Building and analyzing protein interactome networks by cross-species comparisons , 2010, BMC Systems Biology.

[12]  Chunquan Li,et al.  The Implications of Relationships between Human Diseases and Metabolic Subpathways , 2011, PloS one.

[13]  Kay A Robbins,et al.  SIDECACHE: Information access, management and dissemination framework for web services , 2011, BMC Research Notes.

[14]  O. Rath,et al.  MAP kinase signalling pathways in cancer , 2007, Oncogene.

[15]  D. Lane,et al.  p53, guardian of the genome , 1992, Nature.

[16]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[17]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[18]  Sanghyuk Lee,et al.  hiPathDB: a human-integrated pathway database with facile visualization , 2011, Nucleic Acids Res..

[19]  H. V. Jagadish,et al.  ConceptGen: a gene set enrichment and gene set relation mapping tool , 2010, Bioinform..

[20]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[21]  Sergei Egorov,et al.  Pathway studio - the analysis and navigation of molecular networks , 2003, Bioinform..

[22]  Ram Varma,et al.  Simultaneous modeling of concentration-effect and time-course patterns in gene expression data from microarrays. , 2008, Cancer genomics & proteomics.

[23]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[24]  P. Karp,et al.  Computational prediction of human metabolic pathways from the complete human genome , 2004, Genome Biology.

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  H. Ji,et al.  A network-based gene-weighting approach for pathway analysis , 2011, Cell Research.

[27]  Kihoon Yoon,et al.  SIDEKICK: Genomic data driven analysis and decision-making framework , 2010, BMC Bioinformatics.

[28]  Hiroshi Mamitsuka,et al.  A hierarchical mixture of Markov models for finding biologically active metabolic paths using gene expression and protein classes , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[29]  Alexander R. Pico,et al.  WikiPathways: Pathway Editing for the People , 2008, PLoS biology.