Identifying significant edges in graphical models of molecular networks

Objective Modelling the associations from high-throughput experimental molecular data has provided unprecedented insights into biological pathways and signalling mechanisms. Graphical models and networks have especially proven to be useful abstractions in this regard. Ad hoc thresholds are often used in conjunction with structure learning algorithms to determine significant associations. The present study overcomes this limitation by proposing a statistically motivated approach for identifying significant associations in a network. Methods and materials A new method that identifies significant associations in graphical models by estimating the threshold minimising the L1 norm between the cumulative distribution function (CDF) of the observed edge confidences and those of its asymptotic counterpart is proposed. The effectiveness of the proposed method is demonstrated on popular synthetic data sets as well as publicly available experimental molecular data corresponding to gene and protein expression profiles. Results The improved performance of the proposed approach is demonstrated across the synthetic data sets using sensitivity, specificity and accuracy as performance metrics. The results are also demonstrated across varying sample sizes and three different structure learning algorithms with widely varying assumptions. In all cases, the proposed approach has specificity and accuracy close to 1, while sensitivity increases linearly in the logarithm of the sample size. The estimated threshold systematically outperforms common ad hoc ones in terms of sensitivity while maintaining comparable levels of specificity and accuracy. Networks from experimental data sets are reconstructed accurately with respect to the results from the original papers. Conclusion Current studies use structure learning algorithms in conjunction with ad hoc thresholds for identifying significant associations in graphical abstractions of biological pathways and signalling mechanisms. Such an ad hoc choice can have pronounced effect on attributing biological significance to the associations in the resulting network and possible downstream analysis. The statistically motivated approach presented in this study has been shown to outperform ad hoc thresholds and is expected to alleviate spurious conclusions of significant associations in such graphical abstractions.

[1]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[2]  O. Morgenthaler,et al.  Proceedings of the Conference , 1930 .

[3]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[4]  Pedro Larrañaga,et al.  Learning Bayesisan Networks by Genetic Algorithms: A Case Study in the Prediction of Survival in Malignant Skin Melanoma , 1997, AIME.

[5]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[6]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence, Second Edition , 2010 .

[7]  Gregory F. Cooper,et al.  Causal Discovery from a Mixture of Experimental and Observational Data , 1999, UAI.

[8]  Graham J. Wills,et al.  Introduction to graphical modelling , 1995 .

[9]  M. Scutari,et al.  Bayesian Network Structure Learning with Permutation Tests , 2011, 1101.5184.

[10]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[11]  Sun Yong Kim,et al.  Bootstrap Analysis of Gene Networks Based on Bayesian Networks and Nonparametric Regression , 2002 .

[12]  Marek J. Druzdzel,et al.  A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data , 1999, UAI.

[13]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[14]  R. Nagarajan,et al.  Functional Relationships between Genes Associated with Differentiation Potential of Aged Myogenic Progenitors , 2010, Front. Physiology.

[15]  Robert Castelo,et al.  A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n , 2006, J. Mach. Learn. Res..

[16]  A. H. Murphy,et al.  Hailfinder: A Bayesian system for forecasting severe weather , 1996 .

[17]  N. Hjort,et al.  Comprar Model Selection and Model Averaging | Gerda Claeskens | 9780521852258 | Cambridge University Press , 2008 .

[18]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[19]  Nils Lid Hjort,et al.  Model Selection and Model Averaging , 2001 .

[20]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[21]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[22]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[23]  D. Margaritis Learning Bayesian Network Model Structure from Data , 2003 .

[24]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[25]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[26]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[27]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[28]  B. West The Wisdom of the Body; A Contemporary View , 2010, Front. Physiology.

[29]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[30]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[31]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[32]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[33]  Korbinian Strimmer,et al.  Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks , 2008, J. Mach. Learn. Res..

[34]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[35]  S. Fomin,et al.  Elements of the Theory of Functions and Functional Analysis , 1961 .

[36]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[37]  Vasant Honavar,et al.  Efficient Markov Network Structure Discovery using Independence Tests , 2006, SDM.

[38]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[39]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[40]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.