Canonical ensembles for potentially incompatible dependency networks with applications to medical data

A directed graph is either acyclic or cyclic. This paper focuses on the cyclic model, or dependency network, which represents a collection of univariate conditional distributions. The conditional approach allows a high level of flexibility in modeling because the dependency network is based on the notion that it is computationally convenient to estimate the local distribution of a variable given the remaining variables in a data set. However, the collection of conditional distributions individually estimated within a dependency network is generally not coherent with any joint distribution. The pseudo-Gibbs sampler (PGS) has often been used to estimate joint distributions for incompatible conditional models. We propose a new method for deriving a joint distribution from a given set of potentially incompatible univariate-conditional distributions such that the discrepancies between the given conditional distribution and those computed from the estimated joint distribution is minimized. The method is based on an ensemble of distributions, each of which can be derived from the canonical parameters of a set of given conditional distributions. Through simulation experiments and real data sets, we compare the performance of the ensemble method, the PGS, and a linear programming (LP)-based method. Our comparisons suggest that the ensemble method outperforms both the PGS and LP. The ensemble method is computationally efficient and scalable, and it therefore has the potential to open a new avenue for finding a nearly optimal solution for dependency networks of high dimensions.

[1]  Donald B. Rubin,et al.  Nested multiple imputation of NMES via partially incompatible MCMC , 2003 .

[2]  B. Arnold,et al.  Conditionally specified distributions: an introduction , 2001 .

[3]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[4]  David Heckerman,et al.  Phylogenetic Dependency Networks: Inferring Patterns of CTL Escape and Codon Covariation in HIV-1 Gag , 2008, PLoS Comput. Biol..

[5]  Jörg Drechsler,et al.  Does Convergence Really Matter , 2008 .

[6]  D. Cella,et al.  The functional assessment of cancer therapy (FACT) scale. Development of a brain subscale and revalidation of the general version (FACT‐G) in patients with primary brain tumors , 1995, Cancer.

[7]  Peter Bühlmann,et al.  Bagging, Boosting and Ensemble Methods , 2012 .

[8]  Stephen Rapp,et al.  Qualitative longitudinal analysis of symptoms in patients with primary and metastatic brain tumours , 2008, Journal of the Royal Statistical Society. Series A,.

[9]  J. Tukey,et al.  Transformations Related to the Angular and the Square Root , 1950 .

[10]  John A. Nelder,et al.  Generalized linear models. 2nd ed. , 1993 .

[11]  Edward H. Ip,et al.  Conditionally specified continuous distributions , 2008 .

[12]  Enrique Castillo,et al.  Exact and near compatibility of discrete conditional distributions , 2002 .

[13]  Giuseppe Toffoli,et al.  The role of UGT1A1*28 polymorphism in the pharmacodynamics and pharmacokinetics of irinotecan in patients with metastatic colorectal cancer. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[14]  P J Nichols,et al.  Functional Assessment , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[15]  Donald B. Rubin,et al.  19 Incomplete Data in Epidemiology and Medical Statistics , 2007 .

[16]  Qiang Shen,et al.  Learning Bayesian networks: approaches and issues , 2011, The Knowledge Engineering Review.

[17]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[18]  Feng Chen,et al.  A New Inference Framework for Dependency Networks , 2013 .

[19]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[20]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[21]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[22]  B. Arnold,et al.  Conditional specification of statistical models , 1999 .

[23]  Yuchung J. Wang,et al.  Gibbs ensembles for nearly compatible and incompatible conditional models , 2011, Comput. Stat. Data Anal..

[24]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..

[25]  A. Dobra Variable selection and dependency networks for genomewide data. , 2009, Biostatistics.

[26]  Yuchung J. Wang,et al.  Canonical representation of conditionally specified multivariate discrete distributions , 2009, J. Multivar. Anal..

[27]  D. Tulsky,et al.  The Functional Assessment of Cancer Therapy scale: development and validation of the general measure. , 1993, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[28]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[29]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[30]  D. Heckerman,et al.  Dependency networks for inference , 2000 .