Efficient computational strategies to learn the structure of probabilistic graphical models of cumulative phenomena

Abstract Structural learning of Bayesian Networks (BNs) is a NP-hard problem, which is further complicated by many theoretical issues, such as the I-equivalence among different structures. In this work, we focus on a specific subclass of BNs, named Suppes-Bayes Causal Networks (SBCNs), which include specific structural constraints based on Suppes’ probabilistic causation to efficiently model cumulative phenomena. Here we compare the performance, via extensive simulations, of various state-of-the-art search strategies, such as local search techniques and Genetic Algorithms, as well as of distinct regularization methods. The assessment is performed on a large number of simulated datasets from topologies with distinct levels of complexity, various sample size and different rates of errors in the data. Among the main results, we show that the introduction of Suppes’ constraints dramatically improve the inference accuracy, by reducing the solution space and providing a temporal ordering on the variables. We also report on trade-offs among different search techniques that can be efficiently employed in distinct experimental settings. This manuscript is an extended version of the paper “Structural Learning of Probabilistic Graphical Models of Cumulative Phenomena” presented at the 2018 International Conference on Computational Science [1] .

[1]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[2]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[3]  Niko Beerenwinkel,et al.  Quantifying cancer progression with conjunctive Bayesian networks , 2009, Bioinform..

[4]  Giancarlo Mauri,et al.  Algorithmic methods to infer the evolutionary trajectories in cancer progression , 2015, Proceedings of the National Academy of Sciences.

[5]  Daniele Ramazzotti,et al.  Modeling Cumulative Biological Phenomena with Suppes-Bayes Causal Networks , 2016, bioRxiv.

[6]  Giancarlo Mauri,et al.  CAPRI: Efficient Inference of Cancer Progression Models from Cross-sectional Data , 2014, bioRxiv.

[7]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[8]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[9]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[10]  P. Suppes A Probabilistic Theory Of Causality , 1970 .

[11]  I ScottKirkpatrick Optimization by Simulated Annealing: Quantitative Studies , 1984 .

[12]  Giancarlo Mauri,et al.  TRONCO: an R package for the inference of cancer progression models from heterogeneous genomic data , 2015, bioRxiv.

[13]  J. Lagergren,et al.  Learning Oncogenetic Networks by Reducing to Mixed Integer Linear Programming , 2013, PloS one.

[14]  Gregory F. Cooper,et al.  A Bayesian Method for Constructing Bayesian Belief Networks from Databases , 1991, UAI.

[15]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[16]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[17]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[18]  Nicholas Eriksson,et al.  Conjunctive Bayesian networks , 2006, math/0608417.

[19]  Harry Eugene Stanley,et al.  Catastrophic cascade of failures in interdependent networks , 2009, Nature.

[20]  K. Pearson Mathematical contributions to the theory of evolution.—On a form of spurious correlation which may arise when indices are used in the measurement of organs , 1897, Proceedings of the Royal Society of London.

[21]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[22]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[23]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[24]  Pedro Larrañaga,et al.  Structure Learning of Bayesian Networks by Genetic Algorithms , 1994 .

[25]  I. Good,et al.  The Amalgamation and Geometry of Two-by-Two Contingency Tables , 1987 .

[26]  Bud Mishra,et al.  Causal data science for financial stress testing , 2017, J. Comput. Sci..

[27]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[28]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[29]  Giancarlo Mauri,et al.  Parallel implementation of efficient search schemes for the inference of cancer progression models , 2016, 2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[30]  Scott Kirkpatrick,et al.  Optimization by simulated annealing: Quantitative studies , 1984 .

[31]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[32]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[33]  Giancarlo Mauri,et al.  Design of the TRONCO BioConductor Package for TRanslational ONCOlogy , 2016, R J..

[34]  Giancarlo Mauri,et al.  Inferring Tree Causal Models of Cancer Progression with Probability Raising , 2013, bioRxiv.

[35]  Pedro Larrañaga,et al.  Structure Learning of Bayesian Networks by Genetic Algorithms: A Performance Analysis of Control Parameters , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Francesco Bonchi,et al.  Exposing the probabilistic causal structure of discrimination , 2015, International Journal of Data Science and Analytics.

[37]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[38]  Marco S. Nobile,et al.  Learning the Probabilistic Structure of Cumulative Phenomena with Suppes-Bayes Causal Networks , 2017, ICCS.

[39]  Thomas Bäck,et al.  Selective Pressure in Evolutionary Algorithms: A Characterization of Selection Mechanisms , 1994, International Conference on Evolutionary Computation.

[40]  Daphne Koller,et al.  Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , 2005, UAI.