piMGM: incorporating multi‐source priors in mixed graphical models for learning disease networks

Motivation Learning probabilistic graphs over mixed data is an important way to combine gene expression and clinical disease data. Leveraging the existing, yet imperfect, information in pathway databases for mixed graphical model (MGM) learning is an understudied problem with tremendous potential applications in systems medicine, the problems of which often involve high‐dimensional data. Results We present a new method, piMGM, which can learn with accuracy the structure of probabilistic graphs over mixed data by appropriately incorporating priors from multiple experts with different degrees of reliability. We show that piMGM accurately scores the reliability of prior information from a given expert even at low sample sizes. The reliability scores can be used to determine active pathways in healthy and disease samples. We tested piMGM on both simulated and real data from TCGA, and we found that its performance is not affected by unreliable priors. We demonstrate the applicability of piMGM by successfully using prior information to identify pathway components that are important in breast cancer and improve cancer subtype classification. Availability and implementation http://www.benoslab.pitt.edu/manatakisECCB2018.html Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Andrew J. Sedgewick,et al.  Learning mixed graphical models with separate sparsity parameters and stability-based model selection , 2016, BMC Bioinformatics.

[2]  Ruijiang Li,et al.  Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO , 2017, BMC Bioinformatics.

[3]  W. Jiang,et al.  Brain-derived neurotrophic factor expression predicts adverse pathological & clinical outcomes in human breast cancer , 2011, Cancer Cell International.

[4]  S. Jackson,et al.  Gene Network Reconstruction by Integration of Prior Biological Knowledge , 2015, G3: Genes, Genomes, Genetics.

[5]  Gerhard Reinelt,et al.  Analyzing the regulation of metabolic pathways in human breast cancer , 2009, BMC Medical Genomics.

[6]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[7]  W. Jung,et al.  Differential Site-Based Expression of Pentose Phosphate Pathway-Related Proteins among Breast Cancer Metastases , 2017, Disease markers.

[8]  A. Vincent-Salomon,et al.  The calcineurin/NFAT pathway is activated in diagnostic breast cancer cases and is essential to survival and metastasis of mammary cancer cells , 2015, Cell Death and Disease.

[9]  A. Toker,et al.  Glutathione biosynthesis is a metabolic vulnerability in PI3K/Akt-driven breast cancer , 2016, Nature Cell Biology.

[10]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Yin Liu,et al.  Incorporating prior knowledge into Gene Network Study , 2013, Bioinform..

[12]  A.R. Runnalls,et al.  A Kullback-Leibler Approach to Gaussian Mixture Reduction , 2007 .

[13]  C. Grimaldi,et al.  Differential Roles of Estrogen Receptors α and β in Control of B-Cell Maturation and Selection , 2011, Molecular medicine.

[14]  Guanglong Jiang,et al.  Comprehensive comparison of molecular portraits between cell lines and tumors in breast cancer , 2016, BMC Genomics.

[15]  W. Liu,et al.  Up-regulation of Akt3 in Estrogen Receptor-deficient Breast Cancers and Androgen-independent Prostate Cancer Lines* , 1999, The Journal of Biological Chemistry.

[16]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[17]  Jing Ma,et al.  Network-based pathway enrichment analysis with incomplete network information , 2014, Bioinform..

[18]  Z. L. Monaco,et al.  Noncoding Centromeric RNA Expression Impairs Chromosome Stability in Human and Murine Stem Cells , 2017, Disease markers.

[19]  Tsviya Olender,et al.  GeneCards Version 3: the human gene integrator , 2010, Database J. Biol. Databases Curation.

[20]  D. Rajan Probability, Random Variables, and Stochastic Processes , 2017 .

[21]  Giorgos Borboudakis,et al.  Constraint-based causal discovery with mixed data , 2018, International Journal of Data Science and Analytics.

[22]  A. González-Angulo,et al.  Targeting the phosphatidylinositol 3-kinase signaling pathway in breast cancer. , 2011, The oncologist.

[23]  Etienne Rouleau,et al.  PIK3R1 underexpression is an independent prognostic marker in breast cancer , 2013, BMC Cancer.

[24]  Andrew H. Beck,et al.  Targeting Akt3 signaling in triple-negative breast cancer. , 2014, Cancer research.

[25]  Trevor Hastie,et al.  Learning the Structure of Mixed Graphical Models , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[26]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[27]  F. May,et al.  Insulin Receptor Substrate-1 Expression Is Regulated by Estrogen in the MCF-7 Human Breast Cancer Cell Line* , 2000, The Journal of Biological Chemistry.

[28]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[29]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[30]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.