MinePath: Mining for Phenotype Differential Sub-paths in Molecular Pathways

Pathway analysis methodologies couple traditional gene expression analysis with knowledge encoded in established molecular pathway networks, offering a promising approach towards the biological interpretation of phenotype differentiating genes. Early pathway analysis methodologies, named as gene set analysis (GSA), view pathways just as plain lists of genes without taking into account either the underlying pathway network topology or the involved gene regulatory relations. These approaches, even if they achieve computational efficiency and simplicity, consider pathways that involve the same genes as equivalent in terms of their gene enrichment characteristics. Most recent pathway analysis approaches take into account the underlying gene regulatory relations by examining their consistency with gene expression profiles and computing a score for each profile. Even with this approach, assessing and scoring single-relations limits the ability to reveal key gene regulation mechanisms hidden in longer pathway sub-paths. We introduce MinePath, a pathway analysis methodology that addresses and overcomes the aforementioned problems. MinePath facilitates the decomposition of pathways into their constituent sub-paths. Decomposition leads to the transformation of single-relations to complex regulation sub-paths. Regulation sub-paths are then matched with gene expression sample profiles in order to evaluate their functional status and to assess phenotype differential power. Assessment of differential power supports the identification of the most discriminant profiles. In addition, MinePath assess the significance of the pathways as a whole, ranking them by their p-values. Comparison results with state-of-the-art pathway analysis systems are indicative for the soundness and reliability of the MinePath approach. In contrast with many pathway analysis tools, MinePath is a web-based system (www.minepath.org) offering dynamic and rich pathway visualization functionality, with the unique characteristic to color regulatory relations between genes and reveal their phenotype inclination. This unique characteristic makes MinePath a valuable tool for in silico molecular biology experimentation as it serves the biomedical researchers’ exploratory needs to reveal and interpret the regulatory mechanisms that underlie and putatively govern the expression of target phenotypes.

[1]  Charles M. Perou,et al.  Multiple roles of cyclin-dependent kinase 4/6 inhibitors in cancer therapy. , 2012, Journal of the National Cancer Institute.

[2]  Lazaros G. Papageorgiou,et al.  Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework , 2014, BMC Bioinformatics.

[3]  Jing Wang,et al.  WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013 , 2013, Nucleic Acids Res..

[4]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[5]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[6]  Dong-Guk Shin,et al.  Meta analysis algorithms for microarray gene expression data using Gene Regulatory Networks , 2010, Int. J. Data Min. Bioinform..

[7]  Bruce A. Barton,et al.  Entropy and MDL discretization of continuous variables for Bayesian belief networks , 2000, Int. J. Intell. Syst..

[8]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[9]  Wei Pan,et al.  Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data , 2007, Bioinform..

[10]  H. Nevanlinna,et al.  The CHEK2 gene and inherited breast cancer susceptibility , 2006, Oncogene.

[11]  T. Barrette,et al.  Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. , 2007, Neoplasia.

[12]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[13]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[14]  S. O'toole,et al.  Identification of PUMA as an estrogen target gene that mediates the apoptotic response to tamoxifen in human breast cancer cells and predicts patient outcome and tamoxifen responsiveness in breast cancer , 2011, Oncogene.

[15]  Guanming Wu,et al.  A network module-based method for identifying cancer prognostic signatures , 2012, Genome Biology.

[16]  Duccio Cavalieri,et al.  Using Pathway Signatures as Means of Identifying Similarities among Microarray Experiments , 2009, PloS one.

[17]  X Chen,et al.  The p53-estrogen receptor loop in cancer. , 2013, Current molecular medicine.

[18]  Manolis Tsiknakis,et al.  An algorithmic approach for the effect of transcription factor binding sites over functional gene regulatory networks , 2015, 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE).

[19]  Chunquan Li,et al.  SubpathwayMiner: a software package for flexible identification of pathways , 2009, Nucleic acids research.

[20]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[21]  Cristina Mitrea,et al.  Methods and approaches in the topology-based analysis of biological pathways , 2013, Front. Physiol..

[22]  Fabien Calvo,et al.  Distinct expression patterns of the E3 ligase SIAH-1 and its partner Kid/KIF22 in normal tissues and in the breast tumoral processes , 2010, Journal of experimental & clinical cancer research : CR.

[23]  Xia Li,et al.  A sub-pathway-based approach for identifying drug response principal network , 2011, Bioinform..

[24]  Manolis Tsiknakis,et al.  Integration of biological knowledge in the mixture-of-Gaussians analysis of genomic clustering , 2010, Proceedings of the 10th IEEE International Conference on Information Technology and Applications in Biomedicine.

[25]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[26]  Zhiping Weng,et al.  Gene set enrichment analysis: performance evaluation and usage guidelines , 2012, Briefings Bioinform..

[27]  Christopher E Barbieri,et al.  Loss of p63 leads to increased cell migration and up-regulation of genes involved in invasion and metastasis. , 2006, Cancer research.

[28]  Rosemary Braun,et al.  Identifying differential correlation in gene/pathway combinations , 2008, BMC Bioinformatics.

[29]  J. Renoir,et al.  Estrogen receptor signaling as a target for novel breast cancer therapeutics. , 2013, Biochemical pharmacology.

[30]  E. Birney,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Research.

[31]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[32]  H. Walczak,et al.  Getting TRAIL back on track for cancer therapy , 2014, Cell Death and Differentiation.

[33]  Manolis Tsiknakis,et al.  Bridging miRNAs and pathway analysis in clinical decision support: a case study in nephroblastoma , 2015, Network Modeling Analysis in Health Informatics and Bioinformatics.

[34]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[35]  Guanming Wu,et al.  ReactomeFIViz : a Cytoscape app for pathway and network-based data analysis , 2022 .

[36]  Kevin R. Coombes,et al.  Adhesion Signaling States in AML , 2014 .

[37]  Trey Ideker,et al.  PiNGO: a Cytoscape plugin to find candidate genes in biological networks , 2011, Bioinform..

[38]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[39]  Jeffrey Wyckoff,et al.  Epidermal growth factor receptor overexpression results in increased tumor cell motility in vivo coordinately with enhanced intravasation and metastasis. , 2006, Cancer research.

[40]  Andrew J. Sedgewick,et al.  Learning subgroup-specific regulatory interactions and regulator independence with PARADIGM , 2013, Bioinform..

[41]  Debajit K. Biswas,et al.  Epidermal growth factor-induced nuclear factor κB activation: A major pathway of cell-cycle progression in estrogen-receptor negative breast cancer cells , 2000 .

[42]  Lajos Pusztai,et al.  Predicting prognosis of breast cancer with gene signatures: are we lost in a sea of data? , 2010, Genome Medicine.

[43]  A. Bittner,et al.  Comparison of RNA-Seq and Microarray in Transcriptome Profiling of Activated T Cells , 2014, PloS one.

[44]  Lye Mun Tho,et al.  The ATM-Chk2 and ATR-Chk1 pathways in DNA damage signaling and cancer. , 2010, Advances in cancer research.

[45]  Jessica Andrea Carballido,et al.  Discretization of gene expression data revised , 2016, Briefings Bioinform..

[46]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[47]  Wolfgang Heller,et al.  Triple-negative breast cancer: therapeutic options. , 2007, The Lancet. Oncology.

[48]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[49]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[50]  R. Sutherland Endocrine resistance in breast cancer: new roles for ErbB3 and ErbB4 , 2011, Breast Cancer Research.

[51]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[52]  Hua Dong,et al.  New Statistics for Testing Differential Expression of Pathways from Microarray Data , 2009, Complex.

[53]  Michael R. Kosorok,et al.  Detection of gene pathways with predictive power for breast cancer prognosis , 2010, BMC Bioinformatics.

[54]  P. Hall,et al.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Alexander J. Hartemink,et al.  Principled computational methods for the validation discovery of genetic regulatory networks , 2001 .

[56]  K. Lunetta Genetic Association Studies , 2008, Circulation.

[57]  Maqc Consortium The MicroArray Quality Control ( MAQC )-II study of common practices for the development and validation of microarray-based predictive models , 2012 .

[58]  Bin Ma,et al.  Better score function for peptide identification with ETD MS/MS spectra , 2010, BMC Bioinformatics.

[59]  Xiaojing Guo,et al.  Estrogen Receptor α Regulates ATM Expression through miRNAs in Breast Cancer , 2013, Clinical Cancer Research.

[60]  J. David Schaffer,et al.  Perturbation and candidate analysis to combat overfitting of gene expression microarray data , 2011, Int. J. Comput. Biol. Drug Des..

[61]  Cynthia X Ma,et al.  Endocrine resistance in breast cancer: molecular pathways and rational development of targeted therapies. , 2012, Future oncology.

[62]  Jinyan Li,et al.  Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns , 2002, Bioinform..

[63]  D. Haussler,et al.  Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[64]  Alexander Junge,et al.  KeyPathwayMiner 4.0: condition-specific pathway analysis by combining multiple omics studies and networks with Cytoscape , 2014, BMC Systems Biology.

[65]  Philip R. O. Payne Chapter 1: Biomedical Knowledge Integration , 2012, PLoS Comput. Biol..

[66]  Dorothea Emig,et al.  AltAnalyze and DomainGraph: analyzing and visualizing exon expression data , 2010, Nucleic Acids Res..

[67]  L. Kwong,et al.  Targeted therapy for melanoma: rational combinatorial approaches , 2014, Oncogene.

[68]  Jae Yong Cho,et al.  Gene Expression Signature–Based Prognostic Risk Score in Gastric Cancer , 2011, Clinical Cancer Research.

[69]  George Potamias,et al.  Gene Selection via Discretized Gene-Expression Profiles and Greedy Feature-Elimination , 2004, SETN.

[70]  Y. Yarden,et al.  Untangling the ErbB signalling network , 2001, Nature Reviews Molecular Cell Biology.

[71]  Adrien Richard,et al.  Formal Methods for Modeling Biological Regulatory Networks , 2006 .

[72]  Chunlei Wu,et al.  BioGPS and MyGene.info: organizing online, gene-centric information , 2012, Nucleic Acids Res..

[73]  Ralf Zimmer,et al.  Bioconductor’s EnrichmentBrowser: seamless navigation through combined results of set- & network-based enrichment analysis , 2016, BMC Bioinformatics.

[74]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[75]  J. Bergh,et al.  Strong Time Dependence of the 76-Gene Prognostic Signature for Node-Negative Breast Cancer Patients in the TRANSBIG Multicenter Independent Validation Series , 2007, Clinical Cancer Research.

[76]  Alfonso Valencia,et al.  EnrichNet: network-based gene set enrichment analysis , 2012, Bioinform..

[77]  Eli Upfal,et al.  Algorithms for Detecting Significantly Mutated Pathways in Cancer , 2010, RECOMB.

[78]  Dongxiao Zhu,et al.  TEAK: Topology Enrichment Analysis frameworK for detecting activated biological subpathways , 2012, Nucleic acids research.

[79]  Roberto Romero,et al.  A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity , 2013, PloS one.

[80]  Jun Ma,et al.  THINK Back: KNowledge-based Interpretation of High Throughput data , 2012, BMC Bioinformatics.

[81]  Peer Bork,et al.  KEGG Atlas mapping for global analysis of metabolic pathways , 2008, Nucleic Acids Res..

[82]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[83]  Dong Xu,et al.  Pathway Correlation Profile of Gene-Gene Co-Expression for Identifying Pathway Perturbation , 2012, PloS one.

[84]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[85]  G. Gangenahalli,et al.  Regulated expression of CXCR4 constitutive active mutants revealed the up-modulated chemotaxis and up-regulation of genes crucial for CXCR4 mediated homing and engraftment of hematopoietic stem/progenitor cells , 2013, Journal of stem cells & regenerative medicine.

[86]  Richard M. Karp,et al.  DEGAS: De Novo Discovery of Dysregulated Pathways in Human Diseases , 2010, PloS one.

[87]  Subha Madhavan,et al.  DDN: a caBIG® analytical tool for differential network analysis , 2011, Bioinform..

[88]  George Stephanopoulos,et al.  Determination of minimum sample size and discriminatory expression patterns in microarray data , 2002, Bioinform..

[89]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[90]  Simona Toti,et al.  Eu.Gene Analyzer a tool for integrating gene expression data with pathway databases , 2007, Bioinform..

[91]  C. Britten,et al.  Targeting ErbB receptor signaling: a pan-ErbB approach to cancer. , 2004, Molecular cancer therapeutics.

[92]  Bing Zhang,et al.  An Integrated Approach for the Analysis of Biological Pathways using Mixed Models , 2008, PLoS genetics.

[93]  David S. Wishart,et al.  Bioinformatics Applications Note Systems Biology Metpa: a Web-based Metabolomics Tool for Pathway Analysis and Visualization , 2022 .

[94]  Andreas Möller,et al.  Siah: a promising anticancer target. , 2013, Cancer research.

[95]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[96]  Peter Harremoës,et al.  Binomial and Poisson distributions as maximum entropy distributions , 2001, IEEE Trans. Inf. Theory.

[97]  Angelo Andriulli,et al.  Loss of Connectivity in Cancer Co-Expression Networks , 2014, PloS one.

[98]  Tero Aittokallio,et al.  Genoscape: a Cytoscape plug-in to automate the retrieval and integration of gene expression data and molecular networks , 2009, Bioinform..

[99]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[100]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[101]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[102]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[103]  Monica Chiogna,et al.  Along signal paths: an empirical gene set approach exploiting pathway topology , 2012, Nucleic acids research.

[104]  Qianchuan He,et al.  BIOINFORMATICS ORIGINAL PAPER , 2022 .

[105]  Xiang Zhao,et al.  Pathway-based Analysis Tools for Complex Diseases: A Review , 2014, Genom. Proteom. Bioinform..

[106]  Ruggero G. Pensa,et al.  Assessment of discretization techniques for relevant pattern discovery from gene expression data , 2004, BIOKDD.

[107]  S. Fox,et al.  Targeting Mdmx to treat breast cancers with wild-type p53 , 2015, Cell Death and Disease.

[108]  Henryk Maciejewski,et al.  Gene set analysis methods: statistical models and methodological differences , 2013, Briefings Bioinform..

[109]  Jan Baumbach,et al.  KeyPathwayMiner: Detecting Case-Specific Biological Pathways Using Expression Data , 2011, Internet Math..

[110]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[111]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..

[112]  Yoshiaki Ito,et al.  Context-dependent activation of Wnt signaling by tumor suppressor RUNX3 in gastric cancer cells , 2014, Cancer science.

[113]  M. Vijver,et al.  Technology Insight: tuning into the genetic orchestra using microarrays—limitations of DNA microarrays in clinical practice , 2006, Nature Clinical Practice Oncology.

[114]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[115]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[116]  Sabah Jassim,et al.  Pathway-based gene selection for disease classification , 2011, International Conference on Information Society (i-Society 2011).

[117]  Paolo G. V. Martini,et al.  Graphite Web: web tool for gene set analysis exploiting pathway topology , 2013, Nucleic Acids Res..

[118]  Raj Chari,et al.  Public Databases and Software for the Pathway Analysis of Cancer Genomes , 2007, Cancer informatics.

[119]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[120]  Christina Backes,et al.  NetworkTrail - a web service for identifying and visualizing deregulated subnetworks , 2013, Bioinform..

[121]  Nicola J. Mulder,et al.  From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems , 2011, Bioinform..

[122]  Alfonso Valencia,et al.  TopoGSA: network topological gene set analysis , 2010, Bioinform..

[123]  Xiaoqian Jiang,et al.  Supplementary Issue: Computational Advances in Cancer Informatics (a) , 2022 .

[124]  William Stafford Noble,et al.  The effect of replication on gene expression microarray experiments , 2003, Bioinform..

[125]  Serban Nacu,et al.  Gene expression network analysis and applications to immunology , 2007, Bioinform..

[126]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[127]  Christian Brueffer,et al.  The Sweden Cancerome Analysis Network - Breast (SCAN-B) Initiative: a large-scale multicenter infrastructure towards implementation of breast cancer genomic analyses in the clinical routine , 2015, Genome Medicine.

[128]  Ian O Ellis,et al.  Heregulin β1 drives gefitinib-resistant growth and invasion in tamoxifen-resistant MCF-7 breast cancer cells , 2007, Breast Cancer Research.

[129]  Hsin-Wei Wang,et al.  Meta Analysis of Microarray Data Using Gene Regulation Pathways , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[130]  Qi Liu,et al.  Pathway Analysis of Microarray Data via Regression , 2008, J. Comput. Biol..

[131]  T Park,et al.  PATHOME: an algorithm for accurately detecting differentially expressed subpathways , 2014, Oncogene.

[132]  R. Nadon,et al.  Statistical issues with microarrays: processing and analysis. , 2002, Trends in genetics : TIG.