Evaluation of Causal Structure Learning Methods on Mixed Data Types

Causal structure learning algorithms are very important in many fields, including biomedical sciences, because they can uncover the underlying causal network structure from observational data. Several such algorithms have been developed over the years, but they usually operate on datasets of a single data type: continuous or discrete variables only. More recently, we and others have proposed new causal structure learning algorithms for mixed data types. However, to-date there is no study that critically evaluates these methods' performance. In this paper, we provide the first extensive empirical evaluation of several popular causal structure learning methods on mixed data types and in a variety of parameter settings and sample sizes. Our results serve as a guide as to which method performs the best in a given context, and as such they are a first step towards a "method selection guide" for those running causal modeling methods on real life datasets.

[1]  Panos K. Chrysanthis,et al.  Comparison of strategies for scalable causal discovery of latent variable models from mixed data , 2018, International Journal of Data Science and Analytics.

[2]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[3]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[4]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[5]  Joseph Ramsey,et al.  Scaling up Greedy Equivalence Search for Continuous Variables , 2015, ArXiv.

[6]  Gregory F. Cooper,et al.  Scoring Bayesian networks of mixed variables , 2018, International Journal of Data Science and Analytics.

[7]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[8]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[9]  Joseph Ramsey,et al.  Improving Accuracy and Scalability of the PC Algorithm by Maximizing P-value 1 , 2022 .

[10]  Trevor Hastie,et al.  Learning the Structure of Mixed Graphical Models , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[11]  Gautam Shroff,et al.  Comparative benchmarking of causal discovery algorithms , 2018, COMAD/CODS.

[12]  Tom Heskes,et al.  Causal Discovery from Databases with Discrete and Continuous Variables , 2014, Probabilistic Graphical Models.

[13]  Giorgos Borboudakis,et al.  Towards Robust and Versatile Causal Discovery for Business Applications , 2016, KDD.

[14]  J. Pearl Causal inference in statistics: An overview , 2009 .

[15]  Po-Ling Loh,et al.  High-dimensional learning of linear causal networks via inverse covariance estimation , 2013, J. Mach. Learn. Res..

[16]  Clark Glymour,et al.  Mixed Graphical Models for Causal Analysis of Multi-modal Variables , 2017, ArXiv.

[17]  Jiji Zhang,et al.  Adjacency-Faithfulness and Conservative Causal Inference , 2006, UAI.

[18]  Tom Heskes,et al.  Copula PC Algorithm for Causal Discovery from Mixed Data , 2016, ECML/PKDD.

[19]  Frederick Eberhardt,et al.  Constraint-based Causal Discovery: Conflict Resolution with Answer Set Programming , 2014, UAI.

[20]  Giorgos Borboudakis,et al.  Constraint-based causal discovery with mixed data , 2018, International Journal of Data Science and Analytics.

[21]  Andrew J. Sedgewick,et al.  Learning mixed graphical models with separate sparsity parameters and stability-based model selection , 2016, BMC Bioinformatics.