A feature selection technique for inference of graphs from their known topological properties: Revealing scale-free gene regulatory networks

Abstract An important problem in bioinformatics is the inference of gene regulatory networks (GRNs) from expression profiles. In general, the main limitations faced by GRN inference methods are the small number of samples with huge dimensionalities and the noisy nature of the expression measurements. Alternatives are thus needed to obtain better accuracy for the GRNs inference problem. Many pattern recognition techniques rely on prior knowledge about the problem in addition to the training data to gain statistical estimation power. This work addresses the GRN inference problem by modeling prior knowledge about the network topology. The main contribution of this paper is a novel methodology that aggregates scale-free properties to a classical low-cost feature selection method, known as Sequential Floating Forward Selection (SFFS), for guiding the inference task. Such methodology explores the search space iteratively by applying a scale-free property to reduce the search space. In this way, the search space traversed by the method integrates the exploration of all combinations of predictors set when the number of combinations is small (dimensionality 〈 k 〉 ⩽ 2 ) with a floating search when the number of combinations becomes explosive (dimensionality 〈 k 〉 ⩾ 3 ). This process is guided by scale-free prior information. Experimental results using synthetic and real data show that this technique provides smaller estimation errors than those obtained without guiding the SFFS application by the scale-free model, thus maintaining the robustness of the SFFS method. Therefore, we show that the proposed framework may be applied in combination with other existing GRN inference methods to improve the prediction accuracy of networks with scale-free properties.

[1]  David Correa Martins,et al.  Constructing Probabilistic Genetic Networks of Plasmodium falciparum from Dynamical Expression Signals of the Intraerythrocytic Development Cycle , 2007 .

[2]  C. Espinosa-Soto,et al.  A Gene Regulatory Network Model for Cell-Fate Determination during Arabidopsis thaliana Flower Development That Is Robust and Recovers Experimental Gene Expression Profilesw⃞ , 2004, The Plant Cell Online.

[3]  Teresa M Przytycka,et al.  Network integration meets network dynamics , 2010, BMC Biology.

[4]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[5]  Sui Huang,et al.  Heuristic Approach to Sparse Approximation of Gene Regulatory Networks , 2008, J. Comput. Biol..

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  David Correa Martins,et al.  W-operator window design by minimization of mean conditional entropy , 2006, Pattern Analysis and Applications.

[8]  Michael L. Bittner,et al.  Growing genetic regulatory networks from seed genes , 2004, Bioinform..

[9]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[10]  P. Brown,et al.  A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. , 1996, Genome research.

[11]  M. Brun,et al.  Conditioning-Based Modeling of Contextual Genomic Regulation , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[13]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[14]  Melissa J. Davis,et al.  Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets , 2012, Genome Medicine.

[15]  S. Bornholdt,et al.  Boolean Network Model Predicts Cell Cycle Sequence of Fission Yeast , 2007, PloS one.

[16]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[17]  Simon Lin,et al.  Methods of microarray data analysis III , 2002 .

[18]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[19]  S. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[20]  David Correa Martins,et al.  Comparative study of GRNS inference methods based on feature selection by mutual information , 2009, 2009 IEEE International Workshop on Genomic Signal Processing and Statistics.

[21]  Qing Nie,et al.  Incorporating Existing Network Information into Gene Network Inference , 2009, PloS one.

[22]  Stephen M. Hewitt,et al.  Post-analysis follow-up and validation of microarray experiments , 2002, Nature Genetics.

[23]  David Correa Martins,et al.  Feature selection environment for genomic applications , 2008, BMC Bioinformatics.

[24]  David Correa Martins,et al.  U-curve: A branch-and-bound optimization algorithm for U-shaped cost functions on Boolean lattices applied to the feature selection problem , 2010, Pattern Recognit..

[25]  David Correa Martins,et al.  SFFS-MR: A Floating Search Strategy for GRNs Inference , 2010, PRIB.

[26]  Ludovic Cottret,et al.  An Introduction to Metabolic Networks and Their Structural Analysis , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[27]  Josef Kittler,et al.  Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Satoru Miyano,et al.  Superiority of network motifs over optimal networks and an application to the revelation of gene network evolution , 2005, Bioinform..

[29]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[30]  Minping Qian,et al.  Stochastic model of yeast cell-cycle network , 2006, q-bio/0605011.

[31]  Christophe Ambroise,et al.  Statistical Applications in Genetics and Molecular Biology Weighted-LASSO for Structured Network Inference from Time Course Data , 2011 .

[32]  Aurélien Naldi,et al.  Dynamical analysis of a generic Boolean model for the control of the mammalian cell cycle , 2006, ISMB.

[33]  Q. Ouyang,et al.  The yeast cell-cycle network is robustly designed. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[35]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[36]  Edward R. Dougherty,et al.  The fundamental role of pattern recognition for gene-expression/microarray data in bioinformatics , 2005, Pattern Recognit..

[37]  D. Thieffry,et al.  A logical analysis of the Drosophila gap-gene system. , 2001, Journal of theoretical biology.

[38]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[39]  A. Barabasi,et al.  The topology of the transcription regulatory network in the yeast , 2002, cond-mat/0205181.

[40]  Pavel Paclík,et al.  Adaptive floating search methods in feature selection , 1999, Pattern Recognit. Lett..

[41]  Edward R. Dougherty,et al.  A CoD-based reduction algorithm for designing stationary control policies on Boolean networks , 2010, Bioinform..

[42]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[43]  Ron Shamir,et al.  Identification of functional modules using network topology and high-throughput data , 2007, BMC Systems Biology.

[44]  David Correa Martins,et al.  Intrinsically Multivariate Predictive Genes , 2008, IEEE Journal of Selected Topics in Signal Processing.

[45]  Junhee Seok,et al.  Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships , 2010, BMC Bioinformatics.

[46]  N. D. Clarke,et al.  Correction: Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PLoS ONE.

[47]  Edward R Dougherty,et al.  Validation of Inference Procedures for Gene Regulatory Networks , 2007, Current genomics.

[48]  Sanghamitra Bandyopadhyay,et al.  Combining Multisource Information Through Functional-Annotation-Based Weighting: Gene Function Prediction in Yeast , 2009, IEEE Transactions on Biomedical Engineering.

[49]  Roberto Marcondes Cesar Junior,et al.  Gene Expression Complex Networks: Synthesis, Identification, and Analysis , 2011, J. Comput. Biol..

[50]  James Bailey,et al.  Using Gene Ontology annotations in exploratory microarray clustering to understand cancer etiology , 2010, Pattern Recognit. Lett..

[51]  Alvis Brazma,et al.  Current approaches to gene regulatory network modelling , 2007, BMC Bioinformatics.

[52]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[53]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[54]  P. Bourgine,et al.  Topological and causal structure of the yeast transcriptional regulatory network , 2002, Nature Genetics.

[55]  E. Dougherty,et al.  Inferring Connectivity of Genetic Regulatory Networks Using Information-Theoretic Criteria , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[56]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[57]  Edward R. Dougherty,et al.  Multiresolution Design of Aperture Operators , 2002, Journal of Mathematical Imaging and Vision.

[58]  Roberto Marcondes Cesar Junior,et al.  Inference of gene regulatory networks from time series by Tsallis entropy , 2011, BMC Systems Biology.

[59]  Michael Hecker,et al.  Gene regulatory network inference: Data integration in dynamic models - A review , 2009, Biosyst..

[60]  H. Othmer,et al.  The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. , 2003, Journal of theoretical biology.

[61]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[62]  Robert Clarke,et al.  Reconstruction of Gene Regulatory Modules in Cancer Cell Cycle by Multi-Source Data Integration , 2010, PloS one.

[63]  R. Albert Scale-free networks in cell biology , 2005, Journal of Cell Science.

[64]  Edward R. Dougherty,et al.  A switching algorithm for design of optimal increasing binary filters over large windows , 2000, Pattern Recognit..

[65]  Fabricio M. Lopes,et al.  Assessing the gain of biological data integration in gene networks inference , 2012, BMC Genomics.

[66]  Edgar Wingender,et al.  An approach to evaluate the topological significance of motifs and other patterns in regulatory networks , 2009, BMC Systems Biology.

[67]  Patrick McConnell,et al.  Comprar Methods of Microarray Data Analysis V | McConnell, Patrick | 9780387345680 | Springer , 2007 .

[68]  Ashish Choudhury,et al.  Control approaches for probabilistic gene regulatory networks - What approaches have been developed for addreassinig the issue of intervention? , 2007, IEEE Signal Processing Magazine.

[69]  R. Rengaswamy,et al.  Structural Properties of Gene Regulatory Networks: Definitions and Connections , 2009, TCBB.

[70]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[71]  Jacky L. Snoep,et al.  From isolation to integration, a systems biology approach for building the Silicon Cell. , 2005 .

[72]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[73]  Paul Erdös,et al.  On random graphs, I , 1959 .

[74]  Edward R. Dougherty,et al.  Nonlinear Filter Design Using Envelopes , 2004, Journal of Mathematical Imaging and Vision.

[75]  Julio Collado-Vides,et al.  RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units) , 2010, Nucleic Acids Res..