An iterative feature selection method for GRNs inference by exploring topological properties

An important problem in bioinformatics is the inference of gene regulatory networks (GRN) from temporal expression profiles. In general, the main limitations faced by GRN inference methods is the small number of samples with huge dimensionalities and the noisy nature of the expression measurements. In face of these limitations, alternatives are needed to get better accuracy on the GRNs inference problem. This work addresses this problem by presenting an alternative feature selection method that applies prior knowledge on its search strategy, called SFFS-BA. The proposed search strategy is based on the Sequential Floating Forward Selection (SFFS) algorithm, with the inclusion of a scale-free (Barab\'asi-Albert) topology information in order to guide the search process to improve inference. The proposed algorithm explores the scale-free property by pruning the search space and using a power law as a weight for reducing it. In this way, the search space traversed by the SFFS-BA method combines a breadth-first search when the number of combinations is small ( >= 3), being guided by the scale-free prior information. Experimental results show that the SFFS-BA provides a better inference similarities than SFS and SFFS, keeping the robustness of the SFS and SFFS methods, thus presenting very good results.

[1]  David Correa Martins,et al.  Constructing Probabilistic Genetic Networks of Plasmodium falciparum from Dynamical Expression Signals of the Intraerythrocytic Development Cycle , 2007 .

[2]  Guy Karlebach,et al.  Modelling and analysis of gene regulatory networks , 2008, Nature Reviews Molecular Cell Biology.

[3]  Teresa M Przytycka,et al.  Network integration meets network dynamics , 2010, BMC Biology.

[4]  R. Albert Scale-free networks in cell biology , 2005, Journal of Cell Science.

[5]  A. Barabasi,et al.  The topology of the transcription regulatory network in the yeast , 2002, cond-mat/0205181.

[6]  Pavel Paclík,et al.  Adaptive floating search methods in feature selection , 1999, Pattern Recognit. Lett..

[7]  P. Brown,et al.  A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. , 1996, Genome research.

[8]  Junhee Seok,et al.  Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships , 2010, BMC Bioinformatics.

[9]  P. Bourgine,et al.  Topological and causal structure of the yeast transcriptional regulatory network , 2002, Nature Genetics.

[10]  E. Dougherty,et al.  Inferring Connectivity of Genetic Regulatory Networks Using Information-Theoretic Criteria , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[12]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[13]  Paul Erdös,et al.  On random graphs, I , 1959 .

[14]  Petros Lenas,et al.  Developmental engineering: a new paradigm for the design and manufacturing of cell-based products. Part II: from genes to networks: tissue engineering from the viewpoint of systems biology and network science. , 2009, Tissue engineering. Part B, Reviews.

[15]  Mark P. Styczynski,et al.  Overview of computational methods for the inference of gene regulatory networks , 2005, Comput. Chem. Eng..

[16]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[17]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[18]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[19]  Michael L. Bittner,et al.  Growing genetic regulatory networks from seed genes , 2004, Bioinform..

[20]  Alvis Brazma,et al.  Current approaches to gene regulatory network modelling , 2007, BMC Bioinformatics.

[21]  Sanghamitra Bandyopadhyay,et al.  Combining Multisource Information Through Functional-Annotation-Based Weighting: Gene Function Prediction in Yeast , 2009, IEEE Transactions on Biomedical Engineering.

[22]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[23]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[24]  Alfred O. Hero,et al.  Using Directed Information to Build Biologically Relevant Influence Networks , 2007, J. Bioinform. Comput. Biol..

[25]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[26]  David Correa Martins,et al.  U-curve: A branch-and-bound optimization algorithm for U-shaped cost functions on Boolean lattices applied to the feature selection problem , 2010, Pattern Recognit..

[27]  Ludovic Cottret,et al.  An Introduction to Metabolic Networks and Their Structural Analysis , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[28]  U. Alon Network motifs: theory and experimental approaches , 2007, Nature Reviews Genetics.

[29]  Josef Kittler,et al.  Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Satoru Miyano,et al.  Superiority of network motifs over optimal networks and an application to the revelation of gene network evolution , 2005, Bioinform..

[31]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[32]  Olga G. Troyanskaya,et al.  Putting microarrays in a context: Integrated analysis of diverse biological data , 2005, Briefings Bioinform..

[33]  David Correa Martins,et al.  W-operator window design by minimization of mean conditional entropy , 2006, Pattern Analysis and Applications.

[34]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[35]  Tong Wang,et al.  TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base , 2010, BMC Bioinformatics.

[36]  Ron Shamir,et al.  Identification of functional modules using network topology and high-throughput data , 2007, BMC Systems Biology.

[37]  Qing Nie,et al.  Incorporating Existing Network Information into Gene Network Inference , 2009, PloS one.

[38]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[39]  Albert-László Barabási,et al.  Scale-Free Networks: A Decade and Beyond , 2009, Science.

[40]  M. Brun,et al.  Conditioning-Based Modeling of Contextual Genomic Regulation , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  Dirk Husmeier,et al.  Gene Regulatory Network Reconstruction by Bayesian Integration of Prior Knowledge and/or Different Experimental Conditions , 2008, J. Bioinform. Comput. Biol..

[42]  David Correa Martins,et al.  Intrinsically Multivariate Predictive Genes , 2008, IEEE Journal of Selected Topics in Signal Processing.

[43]  Edward R Dougherty,et al.  Validation of Inference Procedures for Gene Regulatory Networks , 2007, Current genomics.

[44]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[45]  Stuart A. Kauffman,et al.  On the Sparse Reconstruction of Gene Networks , 2008, J. Comput. Biol..

[46]  S. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[47]  R. Rengaswamy,et al.  Structural Properties of Gene Regulatory Networks: Definitions and Connections , 2009, TCBB.

[48]  Steffen Klamt,et al.  Structural and functional analysis of cellular networks with CellNetAnalyzer , 2007, BMC Systems Biology.

[49]  Sui Huang,et al.  Heuristic Approach to Sparse Approximation of Gene Regulatory Networks , 2008, J. Comput. Biol..

[50]  Jacob de Vlieg,et al.  Integrating gene expression and GO classification for PCA by preclustering , 2010, BMC Bioinformatics.

[51]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.

[52]  Edward R. Dougherty,et al.  A CoD-based reduction algorithm for designing stationary control policies on Boolean networks , 2010, Bioinform..

[53]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[54]  Ziv Bar-Joseph,et al.  A Semi-Supervised Method for Predicting Transcription Factor–Gene Interactions in Escherichia coli , 2008, PLoS Comput. Biol..

[55]  David Correa Martins,et al.  Feature selection environment for genomic applications , 2008, BMC Bioinformatics.

[56]  David Correa Martins,et al.  SFFS-MR: A Floating Search Strategy for GRNs Inference , 2010, PRIB.

[57]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[58]  Andreas Tauch,et al.  Towards the integrated analysis, visualization and reconstruction of microbial gene regulatory networks , 2008, Briefings Bioinform..

[59]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[60]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[61]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[62]  Luciano da Fontoura Costa,et al.  Complex networks: The key to systems biology , 2008 .

[63]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[64]  S. Strogatz Exploring complex networks , 2001, Nature.

[65]  O. Kuchaiev,et al.  Simulating trait evolution for cross-cultural comparison , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[66]  Edgar Wingender,et al.  An approach to evaluate the topological significance of motifs and other patterns in regulatory networks , 2009, BMC Systems Biology.

[67]  Marc Vidal,et al.  Interactome modeling , 2005, FEBS letters.

[68]  Roberto Marcondes Cesar Junior,et al.  Gene Expression Complex Networks: Synthesis, Identification, and Analysis , 2011, J. Comput. Biol..

[69]  James Bailey,et al.  Using Gene Ontology annotations in exploratory microarray clustering to understand cancer etiology , 2010, Pattern Recognit. Lett..

[70]  Michael Hecker,et al.  Gene regulatory network inference: Data integration in dynamic models - A review , 2009, Biosyst..

[71]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[72]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[73]  Christophe Ambroise,et al.  Statistical Applications in Genetics and Molecular Biology Weighted-LASSO for Structured Network Inference from Time Course Data , 2011 .

[74]  p. d. moerland DNA Microarray Data Analysis , 2008 .

[75]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..