Learning High-dimensional Directed Acyclic Graphs with Mixed Data-types

In recent years, great strides have been made for causal structure learning in the high-dimensional setting and in the mixed data-type setting when there are both discrete and continuous variables. However, due to the complications involved with modeling continuous-discrete variable interactions, the intersection of these two settings has been relatively understudied. The current paper explores the problem of efficiently extending causal structure learning algorithms to high-dimensional data with mixed data-types. First, we characterize a model over continuous and discrete variables. Second, we derive a degenerate Gaussian (DG) score for mixed data-types and discuss its asymptotic properties. Lastly, we demonstrate the practicality of the DG score on learning causal structures from simulated data sets.

[1]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[2]  Clark Glymour,et al.  A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images , 2016, International Journal of Data Science and Analytics.

[3]  Alain Hauser,et al.  High-dimensional consistency in score-based and hybrid structure learning , 2015, The Annals of Statistics.

[4]  Frederick Eberhardt,et al.  Constraint-based Causal Discovery: Conflict Resolution with Answer Set Programming , 2014, UAI.

[5]  Daniel Malinsky,et al.  Comparing the Performance of Graphical Structure Learning Algorithms with TETRAD , 2016, 1607.08110.

[6]  Jiji Zhang,et al.  Adjacency-Faithfulness and Conservative Causal Inference , 2006, UAI.

[7]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[8]  Christopher Meek,et al.  The dimensionality of mixed ancestral graphs , 1997 .

[9]  Tom Heskes,et al.  Copula PC Algorithm for Causal Discovery from Mixed Data , 2016, ECML/PKDD.

[10]  Susanne Bottcher,et al.  Learning Bayesian networks with mixed variables , 2001, AISTATS.

[11]  Giorgos Borboudakis,et al.  Towards Robust and Versatile Causal Discovery for Business Applications , 2016, KDD.

[12]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[13]  Naftali Harris,et al.  PC algorithm for nonparanormal graphical models , 2013, J. Mach. Learn. Res..

[14]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[15]  Trevor J. Hastie,et al.  Structure Learning of Mixed Graphical Models , 2013, AISTATS.

[16]  Vineet K Raghu,et al.  Evaluation of Causal Structure Learning Methods on Mixed Data Types , 2018, CD@KDD.

[17]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[18]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[19]  Gregory F. Cooper,et al.  Scoring Bayesian networks of mixed variables , 2018, International Journal of Data Science and Analytics.

[20]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[21]  Clark Glymour,et al.  Mixed graphical models for integrative causal analysis with application to chronic lung disease diagnosis and prognosis , 2018, Bioinform..