A General Framework for Mixed Graphical Models

\Mixed Data" comprising a large number of heterogeneous variables (e.g. count, binary, continuous, skewed continuous, among other data types) are prevalent in varied areas such as genomics and proteomics, imaging genetics, national security, social networking, and Internet advertising. There have been limited eorts at statistically modeling such mixed data jointly, in part because of the lack of computationally amenable multivariate distributions that can capture direct dependencies between such mixed variables of dierent types. In this paper, we address this by introducing a novel class of Block Directed Markov Random Fields (BDMRFs). Using the basic building block of node-conditional univariate exponential families from Yang et al. (2012), we introduce a class of mixed conditional random eld distributions, that are then chained according to a block-directed acyclic graph to form our class of Block Directed Markov Random Fields (BDMRFs). The Markov independence graph structure underlying a BDMRF thus has both directed and undirected edges. We introduce conditions under which these distributions exist and are normalizable, study several instances of our models, and propose scalable penalized conditional likelihood estimators with statistical guarantees for recovering the underlying network structure. Simulations as well as an application to learning mixed genomic networks from next generation sequencing expression data and mutation data demonstrate the versatility of our methods.

[1]  Trevor J. Hastie,et al.  Learning Mixed Graphical Models , 2012, ArXiv.

[2]  Pradeep Ravikumar,et al.  On Poisson Graphical Models , 2013, NIPS.

[3]  Ali Shojaie,et al.  Selection and estimation for mixed graphical models. , 2013, Biometrika.

[4]  N. Hynes,et al.  Potential for targeting the fibroblast growth factor receptors in breast cancer. , 2010, Cancer research.

[5]  A. Dobra,et al.  Copula Gaussian graphical models and their application to modeling functional disability data , 2011, 1108.1680.

[6]  S. Kim,et al.  Promoter hypomethylation of the N-acetyltransferase 1 gene in breast cancer. , 2008, Oncology reports.

[7]  S. Lauritzen,et al.  Mixed graphical association models; discussions and reply , 1989 .

[8]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[9]  Eric P. Xing,et al.  Heterogeneous multitask learning with joint sparsity constraints , 2009, NIPS.

[10]  Peter Clifford,et al.  Markov Random Fields in Statistics , 2012 .

[11]  Seunghak Lee,et al.  Adaptive Multi-Task Lasso: with Application to eQTL Detection , 2010, NIPS.

[12]  M. Frydenberg,et al.  Decomposition of maximum likelihood in mixed graphical interaction models , 1989 .

[13]  Genevera I. Allen,et al.  A Log-Linear Graphical Model for inferring genetic networks from high-throughput sequencing data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[14]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[15]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[16]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[17]  H. Zou,et al.  Regularized rank-based estimation of high-dimensional nonparanormal graphical models , 2012, 1302.3082.

[18]  Michael I. Jordan Graphical Models , 1998 .

[19]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[20]  S. Lauritzen Propagation of Probabilities, Means, and Variances in Mixed Graphical Association Models , 1992 .

[21]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[22]  Pradeep Ravikumar,et al.  Graphical Models via Generalized Linear Models , 2012, NIPS.

[23]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[24]  D. Lane,et al.  The p53 tumour suppressor gene , 1998, The British journal of surgery.

[25]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[26]  Amy V Kapp,et al.  Discovery and validation of breast cancer subtypes , 2006, BMC Genomics.

[27]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[28]  L. Ryan,et al.  Latent Variable Models for Mixed Discrete and Continuous Outcomes , 1997 .

[29]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[30]  K. Miyazaki,et al.  Matrilysin (MMP‐7) cleaves C‐type lectin domain family 3 member A (CLEC3A) on tumor cell surface and modulates its cell adhesion activity , 2009, Journal of cellular biochemistry.

[31]  Pradeep Ravikumar,et al.  Mixed Graphical Models via Exponential Families , 2014, AISTATS.

[32]  Peter Bühlmann,et al.  Stable graphical model estimation with Random Forests for discrete, continuous, and mixed variables , 2011, Comput. Stat. Data Anal..

[33]  D. Seals,et al.  The ADAMs family of metalloproteases: multidomain proteins with multiple functions. , 2003, Genes & development.

[34]  Luca Vogt Statistics For Spatial Data , 2016 .

[35]  Aleix Prat Aparicio Comprehensive molecular portraits of human breast tumours , 2012 .

[36]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[37]  Ali Shojaie,et al.  Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. , 2009, Biometrika.

[38]  Chung-Chian Hsu,et al.  Hierarchical clustering of mixed data based on distance hierarchy , 2007, Inf. Sci..

[39]  A. Papanikolaou,et al.  Cyclin D1 in breast cancer pathogenesis. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[40]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[41]  Z. Su,et al.  PEG-3, a nontransforming cancer progression gene, is a positive regulator of cancer aggressiveness and angiogenesis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[42]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[43]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[44]  T. Nielsen,et al.  GATA-3 Expression in Breast Cancer Has a Strong Association with Estrogen Receptor but Lacks Independent Prognostic Value , 2008, Cancer Epidemiology Biomarkers & Prevention.

[45]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[46]  Genevera I. Allen,et al.  A Local Poisson Graphical Model for Inferring Networks From Sequencing Data , 2013, IEEE Transactions on NanoBioscience.

[47]  Tianxi Li,et al.  High-Dimensional Mixed Graphical Models , 2013, 1304.2810.

[48]  T. Speed,et al.  Gaussian Markov Distributions over Finite Graphs , 1986 .

[49]  Rosette Lidereau,et al.  PIK3CA mutation impact on survival in breast cancer patients and in ERα, PR and ERBB2-based subgroups , 2012, Breast Cancer Research.

[50]  J. Vadgama,et al.  STAT3 activation in HER2-overexpressing breast cancer promotes epithelial-mesenchymal transition and cancer stem cell traits , 2013, International journal of oncology.

[51]  E. Lander,et al.  Estrogen expands breast cancer stem-like cells through paracrine FGF/Tbx3 signaling , 2010, Proceedings of the National Academy of Sciences.

[52]  Pradeep Ravikumar,et al.  Graphical models via univariate exponential family distributions , 2013, J. Mach. Learn. Res..

[53]  Pradeep Ravikumar,et al.  Conditional Random Fields via Univariate Exponential Families , 2013, NIPS.

[54]  Larry A. Wasserman,et al.  High Dimensional Semiparametric Gaussian Copula Graphical Models. , 2012, ICML 2012.