Bayesian optimization of the PC algorithm for learning Gaussian Bayesian networks

The PC algorithm is a popular method for learning the structure of Gaussian Bayesian networks. It carries out statistical tests to determine absent edges in the network. It is hence governed by two parameters: (i) The type of test, and (ii) its significance level. These parameters are usually set to values recommended by an expert. Nevertheless, such an approach can suffer from human bias, leading to suboptimal reconstruction results. In this paper we consider a more principled approach for choosing these parameters in an automatic way. For this we optimize a reconstruction score evaluated on a set of different Gaussian Bayesian networks. This objective is expensive to evaluate and lacks a closed-form expression, which means that Bayesian optimization (BO) is a natural choice. BO methods use a model to guide the search and are hence able to exploit smoothness properties of the objective surface. We show that the parameters found by a BO method outperform those found by a random search strategy and the expert recommendation. Importantly, we have found that an often overlooked statistical test provides the best over-all reconstruction results.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[3]  Olga Vitek,et al.  From Correlation to Causality: Statistical Approaches to Learning Regulatory Relationships in Large-Scale Biomolecular Investigations. , 2016, Journal of proteome research.

[4]  Tao Li,et al.  Differentially private classification with decision tree ensemble , 2018, Appl. Soft Comput..

[5]  Brandon M. Malone,et al.  Empirical hardness of finding optimal Bayesian network structures: algorithm selection and runtime prediction , 2017, Machine Learning.

[6]  Rong-Her Chiu,et al.  PREDICTING CUSTOMER RETENTION LIKELIHOOD IN THE CONTAINER SHIPPING INDUSTRY THROUGH THE DECISION TREE APPROACH , 2017 .

[7]  J. Gausemeier,et al.  Industrie 4 . 0 in a Global Context Strategies for Cooperating with International Partners , 2016 .

[8]  Luis Pérez-Lombard,et al.  A review on buildings energy consumption information , 2008 .

[9]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[10]  Li Da Xu,et al.  Industry 4.0: state of the art and future trends , 2018, Int. J. Prod. Res..

[11]  Javier Prieto,et al.  Semantic Analysis System for Industry 4.0 , 2018, KMO.

[12]  Bart De Schutter,et al.  Combining knowledge and historical data for system-level fault diagnosis of HVAC systems , 2017, Eng. Appl. Artif. Intell..

[13]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[14]  Eduardo C. Garrido-Merchán,et al.  Dealing with Categorical and Integer-valued Variables in Bayesian Optimization with Gaussian Processes , 2017, Neurocomputing.

[15]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[16]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[17]  Don-Lin Yang,et al.  Automatic machine status prediction in the era of Industry 4.0: Case study of machines in a spring factory , 2017, J. Syst. Archit..

[18]  Brandon M. Malone,et al.  Impact of Learning Strategies on the Quality of Bayesian Networks: An Empirical Evaluation , 2015, UAI.

[19]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[20]  George K. Karagiannidis,et al.  Efficient Machine Learning for Big Data: A Review , 2015, Big Data Res..

[21]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[22]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[23]  Marcelo Ângelo Cirillo,et al.  Data classification with binary response through the Boosting algorithm and logistic regression , 2017, Expert Syst. Appl..

[24]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[25]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[26]  Won Y. Lee,et al.  Classification Techniques for Fault Detection and Diagnosis of an Air-Handling Unit | NIST , 1999 .

[27]  Korbinian Strimmer,et al.  Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks , 2008, J. Mach. Learn. Res..

[28]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[29]  Yi Wang,et al.  Intelligent predictive maintenance for fault diagnosis and prognosis in machine centers: Industry 4.0 scenario , 2017 .

[30]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[31]  Yvan Beauregard,et al.  A predictive preference model for maintenance of a heating ventilating and air conditioning system , 2015 .

[32]  Concha Bielza,et al.  Bayesian networks in neuroscience: a survey , 2014, Front. Comput. Neurosci..

[33]  Boris Otto,et al.  Design Principles for Industrie 4.0 Scenarios , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[34]  Steffen L. Lauritzen,et al.  Independence properties of directed markov fields , 1990, Networks.

[35]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.