A New Algorithm for Learning Large Bayesian Network Structure From Discrete Data

Learning the structure of Bayesian networks (BNs) from high dimensional discrete data is common nowadays but a challenging task, due to the large parameter space, the acyclicity constraint placed on the graphical structures and the difficulty in searching for a sparse structure. In this article, we propose a sparse structure learning algorithm (SSLA) to solve this problem. The algorithm uses the negative log-likelihood function of multi-logit regression as a loss function, adding the adaptive group lasso as a penalty term for sparsity, with a new penalty term to ensure that the learned graph is a directed acyclic graph. A block coordinate descent algorithm (BCD) combining with the alternating direction multiplier method (ADMM) algorithm is developed to solve the proposed model. The learned graph is proved theoretically to be a Bayesian network. In order to evaluate the proposed SSLA and compare with its competitors, we conducted intensive simulation studies and applied them to the benchmark Bayesian networks. The results indicate that the SSLA is superior to the hill climbing (HC) algorithm, the CD algorithm and the BFO-B algorithm respectively, and is competitive with K2 algorithm when the order of the nodes is given.

[1]  Luis M. de Campos,et al.  Independency relationships and learning algorithms for singly connected networks , 1998, J. Exp. Theor. Artif. Intell..

[2]  Baocai Yin,et al.  Structural learning of Bayesian networks by bacterial foraging optimization , 2016, Int. J. Approx. Reason..

[3]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[4]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[5]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems - Exact Computational Methods for Bayesian Networks , 1999, Information Science and Statistics.

[6]  Myun-Seok Cheon,et al.  Estimation of Directed Acyclic Graphs Through Two-Stage Adaptive Lasso for Gene Network Inference , 2016, Journal of the American Statistical Association.

[7]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[8]  Changhe Yuan,et al.  Learning Optimal Bayesian Networks Using A* Search , 2011, IJCAI.

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Qing Zhou,et al.  Learning Large-Scale Bayesian Networks with the sparsebn Package , 2017, Journal of Statistical Software.

[11]  Qing Zhou,et al.  Penalized estimation of directed acyclic graphs from discrete data , 2014, Stat. Comput..

[12]  Martin J. McKeown,et al.  Dynamic Bayesian network modeling of fMRI: A comparison of group-analysis methods , 2008, NeuroImage.

[13]  Musa A. Mammadov,et al.  Structure learning of Bayesian Networks using global optimization with applications in data classification , 2015, Optim. Lett..

[14]  Luis M. de Campos,et al.  Searching for Bayesian Network Structures in the Space of Restricted Acyclic Partially Directed Graphs , 2011, J. Artif. Intell. Res..

[15]  Jing Li,et al.  A Sparse Structure Learning Algorithm for Gaussian Bayesian Network Identification from High-Dimensional Data , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[17]  Xue-wen Chen,et al.  Improving Bayesian Network Structure Learning with Mutual Information-Based Node Ordering in the K2 Algorithm , 2008, IEEE Transactions on Knowledge and Data Engineering.

[18]  Jose Miguel Puerta,et al.  Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood , 2010, Data Mining and Knowledge Discovery.

[19]  Wray L. Buntine A Guide to the Literature on Learning Probabilistic Networks from Data , 1996, IEEE Trans. Knowl. Data Eng..

[20]  Christopher Meek,et al.  Causal inference and causal explanation with background knowledge , 1995, UAI.

[21]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[22]  Allan Leck Jensen,et al.  MIDAS: An Influence Diagram for Management of Mildew in Winter Wheat , 1996, UAI.

[23]  Kevin M. Passino,et al.  Biomimicry of bacterial foraging for distributed optimization and control , 2002 .

[24]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[25]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[26]  B. Marcot,et al.  Using Bayesian belief networks to evaluate fish and wildlife population viability under land management alternatives from an environmental impact statement , 2001 .

[27]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[28]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[29]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[30]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[31]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[32]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[33]  Andrew W. Moore,et al.  Finding optimal Bayesian networks by dynamic programming , 2005 .

[34]  Hansheng Wang,et al.  Computational Statistics and Data Analysis a Note on Adaptive Group Lasso , 2022 .

[35]  Juan Zhou,et al.  Learning effective brain connectivity with dynamic Bayesian networks , 2007, NeuroImage.

[36]  Mark E. Borsuk,et al.  A Bayesian network of eutrophication models for synthesis, prediction, and uncertainty analysis , 2004 .

[37]  Gregory F. Cooper,et al.  Causal Discovery from Population-Based Infant Birth and Death Records , 1999, AAAI/IAAI.

[38]  Constantin F. Aliferis,et al.  Generating Realistic Large Bayesian Networks by Tiling , 2006, FLAIRS.

[39]  Luis M. de Campos,et al.  A new approach for learning belief networks using independence criteria , 2000, Int. J. Approx. Reason..

[40]  Jing Li,et al.  Knowledge discovery from observational data for process control using causal Bayesian networks , 2007 .

[41]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[42]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[43]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[44]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[45]  Qing Zhou,et al.  Learning Sparse Causal Gaussian Networks With Experimental Intervention: Regularization and Coordinate Descent , 2013 .

[46]  David Heckerman,et al.  Learning Bayesian Networks: Search Methods and Experimental Results , 1995 .