MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach

Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.

[1]  Fan Zhu,et al.  Regulatory network inferred using expression data of small sample size: application and validation in erythroid system , 2015, Bioinform..

[2]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[3]  Ian M. Marcus,et al.  Dynamics of oscillatory phenotypes in Saccharomyces cerevisiae reveal a network of genome‐wide transcriptional oscillators , 2012, The FEBS journal.

[4]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[5]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[6]  Ling Xu,et al.  MicroRNA transport: A new way in cell communication , 2013, Journal of cellular physiology.

[7]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[8]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[9]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[10]  Henry L Keen,et al.  Elucidating functional context within microarray data by integrated transcription factor-focused gene-interaction and regulatory network analysis. , 2013, European cytokine network.

[11]  E. Davidson,et al.  Response to Comment on "Gene Regulatory Networks and the Evolution of Animal Body Plans" , 2006, Science.

[12]  N. Rajewsky,et al.  The evolution of gene regulation by transcription factors and microRNAs , 2007, Nature Reviews Genetics.

[13]  A. Kasarskis,et al.  A window into third-generation sequencing. , 2010, Human molecular genetics.

[14]  John M. Carroll,et al.  HBLAST: Parallelised sequence similarity - A Hadoop MapReducable basic local alignment search tool , 2015, J. Biomed. Informatics.

[15]  Shigehiko Kanaya,et al.  A Glimpse to Background and Characteristics of Major Molecular Biological Networks , 2015, BioMed research international.

[16]  Yang Song,et al.  Effective alignment of RNA pseudoknot structures using partition function posterior log-odds scores , 2015, BMC Bioinformatics.

[17]  Steven Skiena,et al.  Analysis Techniques for Microarray Time-Series Data , 2002, J. Comput. Biol..

[18]  Ke Chen,et al.  Survey of MapReduce frame operation in bioinformatics , 2013, Briefings Bioinform..

[19]  William J. R. Longabaugh,et al.  Computational representation of developmental genetic regulatory networks. , 2005, Developmental biology.

[20]  P. Brazhnik,et al.  Gene networks: how to put the function in genomics. , 2002, Trends in biotechnology.

[21]  E. Davidson,et al.  Gene regulatory networks for development. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[23]  Claudia Angelini,et al.  Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems , 2014, Front. Cell Dev. Biol..

[24]  James B. Brown,et al.  Lessons from modENCODE. , 2015, Annual review of genomics and human genetics.

[25]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.

[26]  Shigehiko Kanaya,et al.  Systems Biology in the Context of Big Data and Networks , 2014, BioMed research international.

[27]  Hujun Yin,et al.  Intelligent Data Engineering and Automated Learning – IDEAL 2015 , 2015 .

[28]  Jens Nielsen,et al.  Logical transformation of genome-scale metabolic models for gene level applications and analysis , 2015, Bioinform..

[29]  T. Ideker,et al.  Differential network biology , 2012, Molecular systems biology.

[30]  Weisong Shi,et al.  CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping , 2011, BMC Research Notes.

[31]  Shanrong Zhao,et al.  Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing , 2013, BMC Genomics.

[32]  Mark A. Ragan,et al.  Supervised, semi-supervised and unsupervised inference of gene regulatory networks , 2013, Briefings Bioinform..

[33]  E. Li Chromatin modification and epigenetic reprogramming in mammalian development , 2002, Nature Reviews Genetics.

[34]  Michele Ceccarelli,et al.  articleTimeDelay-ARACNE : Reverse engineering of gene networks from time-course data by an information theoretic approach , 2010 .

[35]  Eric H Davidson,et al.  Visualization, documentation, analysis, and communication of large-scale gene regulatory networks. , 2009, Biochimica et biophysica acta.

[36]  K. Pallauf,et al.  Food derived microRNAs. , 2015, Food & function.

[37]  Jason Tsong-Li Wang,et al.  A New Approach to Link Prediction in Gene Regulatory Networks , 2015, IDEAL.

[38]  B. Langmead,et al.  Cloud-scale RNA-sequencing differential expression analysis with Myrna , 2010, Genome Biology.

[39]  Patricia Soteropoulos,et al.  Effective classification of microRNA precursors using feature mining and AdaBoost algorithms. , 2013, Omics : a journal of integrative biology.

[40]  Emad A. Mohammed,et al.  Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends , 2014, BioData Mining.

[41]  Hans van Bokhoven,et al.  Genetic and epigenetic networks in intellectual disabilities. , 2011, Annual review of genetics.

[42]  Junwen Wang,et al.  Inferring gene regulatory networks by integrating ChIP-seq/chip and transcriptome data via LASSO-type regularization methods. , 2014, Methods.

[43]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[44]  Guy Karlebach,et al.  Modelling and analysis of gene regulatory networks , 2008, Nature Reviews Molecular Cell Biology.

[45]  Jason Tsong-Li Wang,et al.  Semi-supervised prediction of gene regulatory networks using machine learning algorithms , 2015, Journal of Biosciences.

[46]  Naoaki Ono,et al.  Integrated pathway-based transcription regulation network mining and visualization based on gene expression profiles , 2016, J. Biomed. Informatics.