A Novel Model Integration Network Inference Algorithm with Clustering and Hub Genes Finding

Gene regulatory network Inference with high accuracy based on gene expression data sets is one of the most challenging problems in computational biology. To improve the accuracy of gene regulatory network inference and find hub genes, we proposed a novel model integration network inference method with clustering and hub genes finding called MINICHG. The method is divided into three main steps: (1) using single models inference results based on three machine learning algorithms to construct feature matrix; (2) using k‐means to cluster gene pairs according to feature matrix; (3) hub genes finding. MINICHG integrates RF(Random Forest), GBDT (Gradient Boosting Decision Tree) and Pearson Correlation results with a novel weighted strategy in a semi‐unsupervised way. The designed optimization scheme in MINICHG considering sparse gold standard data characteristics is suitable for most gene regulatory network reconstruction. We evaluated the proposed method on simulated data sets from five Dream4 multifactorial data sets and Dream5 in silico data set and real data set from E.coli. The performance was better than other network inference methods with high accuracy and robustness.

[1]  Fabio Rinaldi,et al.  RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond , 2015, Nucleic Acids Res..

[2]  R. L. Thorndike Who belongs in the family? , 1953 .

[3]  Habibollah Haron,et al.  Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  G. Sal,et al.  p53-family proteins and their regulators: hubs and spokes in tumor suppression , 2010, Cell Death and Differentiation.

[5]  Daniel A. Beard,et al.  Strong Inference for Systems Biology , 2009, PLoS Comput. Biol..

[6]  K. Lee,et al.  A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice , 2005 .

[7]  C. Ridout,et al.  A change in temperature modulates defence to yellow (stripe) rust in wheat line UC1041 independently of resistance gene Yr36 , 2014, BMC Plant Biology.

[8]  Kwang-Hyun Cho,et al.  Hub genes with positive feedbacks function as master switches in developmental gene regulatory networks , 2009, Bioinform..

[9]  Tomasz Arodz,et al.  ADANET: inferring gene regulatory networks using ensemble classifiers , 2012, BCB.

[10]  Arpad Kelemen,et al.  Computational dynamic approaches for temporal omics data with applications to systems medicine , 2017, BioData Mining.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[13]  S. Corsale,et al.  Specific TP53 and/or Ki-ras mutations as independent predictors of clinical outcome in sporadic colorectal adenocarcinomas: results of a 5-year Gruppo Oncologico dell'Italia Meridionale (GOIM) prospective study. , 2005, Annals of oncology : official journal of the European Society for Medical Oncology.

[14]  Florence d'Alché-Buc,et al.  OKVAR-Boost: a novel boosting algorithm to infer nonlinear dynamics and interactions in gene regulatory networks , 2013, Bioinform..

[15]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[16]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.

[17]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[18]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[19]  Pierre Geurts,et al.  dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data , 2018, Scientific Reports.

[20]  Shubhra Sankar Ray,et al.  Entropic Biological Score: a cell cycle investigation for GRNs inference. , 2014, Gene.

[21]  Galina V. Glazko,et al.  Statistical Inference and Reverse Engineering of Gene Regulatory Networks from Observational Expression Data , 2012, Front. Gene..

[22]  Francisco Gómez-Vela,et al.  Computational methods for Gene Regulatory Networks reconstruction and analysis: A review , 2019, Artif. Intell. Medicine.

[23]  Yasushi Sako,et al.  Inferring a nonlinear biochemical network model from a heterogeneous single-cell time course data , 2018, Scientific Reports.

[24]  Richard Bonneau,et al.  DREAM3: Network Inference Using Dynamic Context Likelihood of Relatedness and the Inferelator , 2010, PloS one.

[25]  P. Brown,et al.  A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. , 1996, Genome research.

[26]  Holger Husi,et al.  Current advances in systems and integrative biology , 2014, Computational and structural biotechnology journal.

[27]  Philippe Kourilsky,et al.  The natural defense system and the normative self model , 2016, F1000Research.

[28]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[29]  Gianluca Bontempi,et al.  On the Impact of Entropy Estimation on Transcriptional Regulatory Network Inference Based on Mutual Information , 2008, EURASIP J. Bioinform. Syst. Biol..

[30]  Pei Wang,et al.  Integrative random forest for gene regulatory network inference , 2015, Bioinform..

[31]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[32]  H. Kitano,et al.  Computational systems biology , 2002, Nature.