BNPA: An R package to learn path analysis input models from a data set semi-automatically using Bayesian networks

Abstract Epidemiologists constantly search for methodologies that help them better understand how diseases work. Populations urge these improvements to combat these diseases more effectively. The literature presents several authors defending the idea that epidemiologists should be able to develop causal models. In this area, the technique of structural equation models (SEM) has stood out in scientific research. Although SEM has been widely used in several research areas, it has been little explored by epidemiologists. Despite its evolution and efficiency, SEM has a gap in terms of discovering causalities. To fill this gap, this study developed an R package called BNPA, whose methodology joins the best of Bayesian network structural learning algorithms (BNSL) from data and path analysis (PA) a SEM subarea. The BNPA was built with pre-processing functions. Its main algorithm allows creating an input model to start the PA from a data set semi-automatically generating information to analyze the PA performance. An analysis of cardiovascular disease’s main predictors was performed using the BNPA with data from the Canadian Community Health Survey (CCHS). Multiple linear regression (MR) was used as a gold standard methodology; the results of BNPA matched 85% of MR results. In conclusion, BNPA is efficient and can benefit researchers, mainly novices, by enabling them to build PA models from data. Furthermore, statisticians and PA experts will have more time to support these researchers instead of creating an initial model.

[1]  Dimitris Margaritis,et al.  Speculative Markov blanket discovery for optimal feature selection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  M. N. S. Swamy,et al.  Graphs: Theory and Algorithms: Thulasiraman/Graphs , 1992 .

[3]  Elpida T. Keravnou,et al.  DBN-Extended: A Dynamic Bayesian Network Model Extended With Temporal Abstractions for Coronary Heart Disease Prognosis , 2016, IEEE Journal of Biomedical and Health Informatics.

[4]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[5]  Rex B. Kline,et al.  Principles and Practice of Structural Equation Modeling , 1998 .

[6]  Yves Rosseel,et al.  lavaan: An R Package for Structural Equation Modeling , 2012 .

[7]  Radhakrishnan Nagarajan,et al.  Bayesian Networks in R , 2013 .

[8]  Yongsuk Kim A Path Analysis Model of Health-Related Quality of Life in Patients with Heart Failure , 2007 .

[9]  Rod Jackson,et al.  Using Directed Acyclic Graphs for Investigating Causal Paths for Cardiovascular Disease , 2013 .

[10]  Xue-wen Chen,et al.  A Markov blanket-based method for detecting causal SNPs in GWAS , 2010, BMC Bioinformatics.

[11]  David Maxwell Chickering,et al.  Learning Bayesian networks: The combination of knowledge and statistical data , 1995, Mach. Learn..

[12]  Armin R. Mikler,et al.  Computational Epidemiology: Bayesian disease Surveillance , 2005, Advances in Bioinformatics and Its Applications.

[13]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[14]  Addendum on the scoring of Gaussian directed acyclic graphical models , 2014, 1402.6863.

[15]  Jessie K. Edwards,et al.  The critical importance of asking good questions: The role of epidemiology doctoral training programs. , 2019, American journal of epidemiology.

[16]  M. N. Shanmukha Swamy,et al.  Graphs: Theory and Algorithms , 1992 .

[17]  M. Barreto,et al.  Structural equation modeling in epidemiology. , 2010, Cadernos de saude publica.

[18]  A. Naba,et al.  Bayesian Network Expert System for Early Diagnosis of Heart Diseases , 2014 .

[19]  B. Riegel,et al.  Structural equation model testing the situation-specific theory of heart failure self-care. , 2013, Journal of advanced nursing.

[20]  Nils Lid Hjort,et al.  Model Selection and Model Averaging: Contents , 2008 .

[21]  Z Wei,et al.  [Using the Tabu-search-algorithm-based Bayesian network to analyze the risk factors of coronary heart diseases]. , 2016, Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi.

[22]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence, Second Edition , 2010 .

[23]  D. Edwards Introduction to graphical modelling , 1995 .

[24]  S. Marsland,et al.  Bayesian Networks: With Examples in R. By M. Scutari and J.‐B. Denis Boca Raton, Florida CRC Press. 2014. 241 pages. UK £69.99 (hardback). ISBN 978‐1‐48222‐558‐7 , 2017 .

[25]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[26]  Fabio Stella,et al.  A continuous time Bayesian network model for cardiogenic heart failure , 2012 .

[27]  W. Cooley,et al.  Multivariate Data Analysis. , 1973 .

[28]  Nils Lid Hjort,et al.  Model Selection and Model Averaging , 2001 .

[29]  D. Margaritis Learning Bayesian Network Model Structure from Data , 2003 .

[30]  S. Srinivasan,et al.  Path analysis of metabolic syndrome components in black versus white children, adolescents, and adults: the Bogalusa Heart Study. , 2008, Annals of epidemiology.

[31]  Jean-Baptiste Denis,et al.  Bayesian Networks , 2014 .

[32]  Baydaa Al-Hamadani,et al.  An Emergency Unit Support System to Diagnose Chronic Heart Failure Embedded with SWRL and Bayesian Network , 2016 .

[33]  Olivier Ledoit,et al.  Improved estimation of the covariance matrix of stock returns with an application to portfolio selection , 2003 .

[34]  Korbinian Strimmer,et al.  Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks , 2008, J. Mach. Learn. Res..

[35]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[36]  E. Kupek,et al.  Beyond logistic regression: structural equations modelling for binary variables and its application to investigating unobserved confounders , 2006, BMC medical research methodology.

[37]  B. Byrne Structural Equation Modeling With AMOS, EQS, and LISREL: Comparative Approaches to Testing for the Factorial Validity of a Measuring Instrument , 2001 .

[38]  Structural equation modeling in the context of clinical research. , 2016, Annals of translational medicine.

[39]  H. D. de Heer,et al.  A Path Analysis of a Randomized Promotora de Salud Cardiovascular Disease–Prevention Trial Among At-Risk Hispanic Adults , 2012, Health education & behavior : the official publication of the Society for Public Health Education.

[40]  Keith A. Markus,et al.  Principles and Practice of Structural Equation Modeling by Rex B. Kline , 2012 .

[41]  Kevin B. Korb,et al.  Incorporating expert knowledge when learning Bayesian network structure: A medical case study , 2011, Artif. Intell. Medicine.

[42]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[43]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[44]  J. Pearl,et al.  Causal diagrams for epidemiologic research. , 1999, Epidemiology.

[45]  A. Alexander Beaujean,et al.  Latent Variable Modeling Using R: A Step-by-Step Guide , 2014 .

[46]  Arno Siebes,et al.  REPORT RAPPORT , 2022 .

[47]  Timothy R. Brick,et al.  OpenMx 2.0: Extended Structural Equation and Statistical Modeling , 2015, Psychometrika.

[48]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[49]  Sewall Wright,et al.  Path coefficients and path regressions: Alternative or complementary concepts? , 1960 .

[50]  L. S. C. de Oliveira,et al.  Bayesian Network with Decision Threshold for Heart Beat Classification , 2016, IEEE Latin America Transactions.

[51]  Allan Tucker,et al.  Learning Bayesian networks from big data with greedy search: computational complexity and efficient implementation , 2018, Statistics and Computing.

[52]  Pierre Legendre,et al.  Comparison of permutation methods for the partial correlation and partial mantel tests , 2000 .

[53]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[54]  Carmen Lacave,et al.  A review of explanation methods for Bayesian networks , 2002, The Knowledge Engineering Review.

[55]  Giorgos Borboudakis,et al.  Permutation Testing Improves Bayesian Network Learning , 2010, ECML/PKDD.

[56]  John Fox,et al.  TEACHER'S CORNER: Structural Equation Modeling With the sem Package in R , 2006 .

[57]  David Maxwell Chickering,et al.  A comparison of scientific and engineering criteria for Bayesian model selection , 2000, Stat. Comput..

[58]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[59]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[60]  Manpreet Singh,et al.  Building a Cardiovascular Disease predictive model using Structural Equation Model & Fuzzy Cognitive Map , 2016, 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[61]  M. J. van der Laan,et al.  Causal Models and Learning from Data: Integrating Causal Modeling and Statistical Estimation , 2014, Epidemiology.

[62]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .