Weighted ensemble learning of Bayesian network for gene regulatory networks

Abstract Gene Regulatory Network (GRN) is known as the most adequate representation of genes׳ interactions based on microarray datasets. One of the most performing modeling tools that enable the inference of these networks is a Bayesian network (BN). When preceded by an efficient pre-processing step, BN learning can unveil possible relationships between key disease genes and allows biologists to analyze these interactions and to exploit them. However, the layout of microarray data is different from classic data. This particularity engenders challenges to BN learning in terms of dimensionality and data over-fitting. In this paper, we propose a fuzzy ensemble clustering method that allows outputting small and highly inter-correlated partitions of genes so that we can overcome dimensionality problem. We present a weighted committee based structure algorithm for learning BNs of each partition without over-fitting training dataset. Moreover, we offer an approach for assembling the sub-BNs through genes in common. We also statistically verify and biologically validate our approach.

[1]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[2]  Cory J. Butz,et al.  Constructing the Dependency Structure of a Multiagent Probabilistic Network , 2001, IEEE Trans. Knowl. Data Eng..

[3]  Salma Jamoussi,et al.  Weighted committee-based structure learning for microarray data , 2013, 13th IEEE International Conference on BioInformatics and BioEngineering.

[4]  Kevin Baker,et al.  Classification of radar returns from the ionosphere using neural networks , 1989 .

[5]  S. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[6]  Franz von Kutschera,et al.  Causation , 1993, J. Philos. Log..

[7]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[8]  Bruce Abramson,et al.  The Topological Fusion of Bayes Nets , 1992, UAI.

[9]  小倩,et al.  Fusion Rings for Degenerate Minimal Models , 2002 .

[10]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[11]  Michalis Vazirgiannis,et al.  Clustering validity assessment: finding the optimal partitioning of a data set , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[12]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[13]  Martin Janzura,et al.  A simulated annealing‐based method for learning Bayesian networks from statistical data , 2006, Int. J. Intell. Syst..

[14]  John Quackenbush,et al.  Seeded Bayesian Networks: Constructing genetic networks from microarray data , 2008, BMC Systems Biology.

[15]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[16]  Arno Siebes,et al.  REPORT RAPPORT , 2022 .

[17]  Ernest Mwebaze,et al.  Fast Committee-Based Structure Learning , 2008, NIPS Causality: Objectives and Assessment.

[18]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[19]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[20]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[21]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  F. Azuaje,et al.  Linking Gene Expression and Functional Network Data in Human Heart Failure , 2007, PloS one.

[23]  Zhoujun Li,et al.  A novel unsupervised feature selection method for bioinformatics data sets through feature clustering , 2008, 2008 IEEE International Conference on Granular Computing.

[24]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[25]  Brian Everitt,et al.  Cluster analysis , 1974 .

[26]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[28]  Hoai-Tuong Nguyen,et al.  Réseaux bayésiens et apprentissage ensembliste pour l'étude différentielle de réseaux de régulation génétique. (Bayesian networks and set-based methods for differential study of gene regulatory networks) , 2012 .

[29]  Kristen LeFevre,et al.  Privacy wizards for social networking sites , 2010, WWW '10.

[30]  André Elisseeff,et al.  Using Markov Blankets for Causal Structure Learning , 2008, J. Mach. Learn. Res..

[31]  Henry Tirri,et al.  B-Course: A Web-Based Tool for Bayesian and Causal Data Analysis , 2002, Int. J. Artif. Intell. Tools.

[32]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[33]  Mark P. Styczynski,et al.  ASSESSING THE USE OF VOTING METHODS TO IMPROVE BAYESIAN NETWORK STRUCTURE LEARNING , 2012 .

[34]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[35]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[36]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[37]  Marinus Maris,et al.  A multi-agent systems approach to distributed bayesian information fusion , 2010, Inf. Fusion.

[38]  Nir Friedman,et al.  Learning Module Networks , 2002, J. Mach. Learn. Res..

[39]  Frank Nielsen,et al.  On weighting clustering , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  H. Sorenson,et al.  Bayesian Parameter Estimation , 2006, Statistical Inference for Engineers and Data Scientists.

[41]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[42]  Serafín Moral,et al.  Qualitative combination of Bayesian networks , 2003, Int. J. Intell. Syst..

[43]  Yi Pan,et al.  Construction and application of dynamic protein interaction network based on time course gene expression data , 2013, Proteomics.

[44]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[45]  Daniela Joiţa,et al.  UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING , 2010 .

[46]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[47]  Vincent Frouin,et al.  Evolutionary approaches for the reverse-engineering of gene regulatory networks: A study on a biologically realistic dataset , 2008, BMC Bioinformatics.

[48]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[49]  Rebecca N. Wright,et al.  Privacy-Preserving Computation of Bayesian Networks on Vertically Partitioned Data , 2006, IEEE Transactions on Knowledge and Data Engineering.

[50]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[51]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[52]  S Nattel,et al.  Molecular mechanisms underlying ionic remodeling in a dog model of atrial fibrillation. , 1999, Circulation research.

[53]  Ricardo J. G. B. Campello,et al.  On the Comparison of Relative Clustering Validity Criteria , 2009, SDM.

[54]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[55]  Desmond J. Smith,et al.  A genome-wide map of human genetic interactions inferred from radiation hybrid genotypes. , 2010, Genome research.

[56]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[57]  Hamid Parvin,et al.  Optimizing Fuzzy Cluster Ensemble in String Representation , 2013, Int. J. Pattern Recognit. Artif. Intell..

[58]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[59]  Tao Wang,et al.  Disease gene explorer: display disease gene dependency by combining Bayesian networks with clustering , 2004 .

[60]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[61]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[62]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[63]  Joydeep Ghosh,et al.  Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[64]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[65]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[66]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[67]  Ruggero G. Pensa,et al.  Assessment of discretization techniques for relevant pattern discovery from gene expression data , 2004, BIOKDD.

[68]  Yan P. Yuan,et al.  HGBASE: a database of SNPs and other variations in and around human genes , 2000, Nucleic Acids Res..

[69]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[70]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[71]  Limin Wang,et al.  Using Consensus Bayesian Network to Model the Reactive Oxygen Species Regulatory Pathway , 2013, PloS one.

[72]  Maurizio Naldi,et al.  A traffic-based evolutionary algorithm for network clustering , 2013, Appl. Soft Comput..

[73]  David B. Allison,et al.  The effect of insulin on expression of genes and biochemical pathways in human skeletal muscle , 2008, Endocrine.