GeNICE: A Novel Framework for Gene Network Inference by Clustering, Exhaustive Search, and Multivariate Analysis

Gene network (GN) inference from temporal gene expression data is a crucial and challenging problem in systems biology. Expression data sets usually consist of dozens of temporal samples, while networks consist of thousands of genes, thus rendering many inference methods unfeasible in practice. To improve the scalability of GN inference methods, we propose a novel framework called GeNICE, based on probabilistic GNs; the main novelty is the introduction of a clustering procedure to group genes with related expression profiles and to provide an approximate solution with reduced computational complexity. We use the defined clusters to perform an exhaustive search to retrieve the best predictor gene subsets for each target gene, according to multivariate criterion functions. GeNICE greatly reduces the search space because predictor candidates are restricted to one gene per cluster. Finally, a multivariate analysis is performed for each defined predictor subset to retrieve minimal subsets and to simplify the network. In our experiments with in silico generated data sets, GeNICE achieved substantial computational time reduction when compared to solutions without the clustering step, while preserving the gene expression prediction accuracy even when the number of clusters is small (about 50) relative to the number of genes (order of thousands). For a Plasmodium falciparum microarray data set, the prediction accuracy achieved by GeNICE was roughly 97%, while the respective topologies involving glycolytic and apicoplast seed genes had a very large intramodularity, very small interconnection between modules, and some module hub genes, reflecting small-world and scale-free topological properties, as expected.

[1]  P. Brown,et al.  A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. , 1996, Genome research.

[2]  H. Othmer,et al.  The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. , 2003, Journal of theoretical biology.

[3]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[4]  David Correa Martins,et al.  A feature selection technique for inference of graphs from their known topological properties: Revealing scale-free gene regulatory networks , 2014, Inf. Sci..

[5]  Elke Achtert,et al.  Visual Evaluation of Outlier Detection Models , 2010, DASFAA.

[6]  Aurélien Naldi,et al.  Dynamical analysis of a generic Boolean model for the control of the mammalian cell cycle , 2006, ISMB.

[7]  Mark P. Styczynski,et al.  Overview of computational methods for the inference of gene regulatory networks , 2005, Comput. Chem. Eng..

[8]  Q. Ouyang,et al.  The yeast cell-cycle network is robustly designed. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Sangsoo Kim,et al.  An efficient top-down search algorithm for learning Boolean networks of gene expression , 2006, Machine Learning.

[10]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[11]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[12]  A. G. de la Fuente,et al.  Gene Network Inference via Structural Equation Modeling in Genetical Genomics Experiments , 2008, Genetics.

[13]  Danilo Carastan-Santos,et al.  Finding exact hitting set solutions for systems biology applications using heterogeneous GPU clusters , 2017, Future Gener. Comput. Syst..

[14]  Ilya Shmulevich,et al.  On Learning Gene Regulatory Networks Under the Boolean Network Model , 2003, Machine Learning.

[15]  Edward R. Dougherty,et al.  Coefficient of determination in nonlinear signal processing , 2000, Signal Process..

[16]  Duncan Fyfe Gillies,et al.  A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data , 2015, Adv. Bioinformatics.

[17]  Ting Chen,et al.  Statistical Detection of Intrinsically Multivariate Predictive Genes , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[19]  David Correa Martins,et al.  Constructing Probabilistic Genetic Networks of Plasmodium falciparum from Dynamical Expression Signals of the Intraerythrocytic Development Cycle , 2007 .

[20]  C. Espinosa-Soto,et al.  A Gene Regulatory Network Model for Cell-Fate Determination during Arabidopsis thaliana Flower Development That Is Robust and Recovers Experimental Gene Expression Profilesw⃞ , 2004, The Plant Cell Online.

[21]  S. Strogatz Exploring complex networks , 2001, Nature.

[22]  E. Dougherty,et al.  MODELING GENETIC REGULATORY NETWORKS: CONTINUOUS OR DISCRETE? , 2006 .

[23]  K. Kinzler,et al.  Serial Analysis of Gene Expression , 1995, Science.

[24]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[25]  Michael Hecker,et al.  Gene regulatory network inference: Data integration in dynamic models - A review , 2009, Biosyst..

[26]  S. Bornholdt,et al.  Boolean Network Model Predicts Cell Cycle Sequence of Fission Yeast , 2007, PloS one.

[27]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[28]  Andrea Pinna,et al.  Bioinformatics Applications Note Systems Biology Simulating Systems Genetics Data with Sysgensim , 2022 .

[29]  Edward R. Dougherty,et al.  Validation of gene regulatory network inference based on controllability , 2013, Front. Genet..

[30]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[31]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[32]  D. Thieffry,et al.  A logical analysis of the Drosophila gap-gene system. , 2001, Journal of theoretical biology.

[33]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[34]  Joaquín Dopazo,et al.  Data Analysis and Visualization in Genomics and Proteomics , 2005 .

[35]  Jan M. Van Campenhout,et al.  On the Possible Orderings in the Measurement Selection Problem , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[36]  Minping Qian,et al.  Stochastic model of yeast cell-cycle network , 2006, q-bio/0605011.

[37]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[38]  S. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[39]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[40]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[41]  David Correa Martins,et al.  Gene regulatory networks inference using a multi-GPU exhaustive search algorithm , 2013, BMC Bioinformatics.

[42]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[43]  Eleanor Howe,et al.  RNA-Seq analysis in MeV , 2011, Bioinform..

[44]  K. Becker,et al.  Analysis of microarray data using Z score transformation. , 2003, The Journal of molecular diagnostics : JMD.

[45]  Sharifalillah Nordin,et al.  Review of dimensionality reduction techniques using clustering algorithm in reconstruction of gene regulatory networks , 2015, 2015 International Conference on Computer, Communications, and Control Technology (I4CT).

[46]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[47]  David Correa Martins,et al.  Signal propagation in Bayesian networks and its relationship with intrinsically multivariate predictive variables , 2013, Inf. Sci..

[48]  David Correa Martins,et al.  Intrinsically Multivariate Predictive Genes , 2008, IEEE Journal of Selected Topics in Signal Processing.

[49]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[50]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[51]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[52]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.