PREMER: A Tool to Infer Biological Networks

Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features—such as distinguishing between direct and indirect interactions or determining the direction of a causal link—requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end, we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux, and OSX (https://sites.google.com/site/premertoolbox/).

[1]  Julio R. Banga,et al.  Reverse Engineering Cellular Networks with Information Theoretic Methods , 2013, Cells.

[2]  J. Ross,et al.  MIDER: Network Inference with Mutual Information Distance and Entropy Reduction , 2014, PloS one.

[3]  Schreiber,et al.  Measuring information transfer , 2000, Physical review letters.

[4]  N LeNovère Quantitative and logic modelling of molecular and gene networks. , 2015 .

[5]  Adam Arkin,et al.  On the deduction of chemical reaction pathways from measurements of time series of concentrations. , 2001, Chaos.

[6]  Julio R. Banga,et al.  Enabling network inference methods to handle missing data and outliers , 2015, BMC Bioinformatics.

[7]  Sapna Kumari,et al.  Evaluation of Gene Association Methods for Coexpression Network Construction and Biological Knowledge Discovery , 2012, PloS one.

[8]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[9]  Jeremiah J. Faith,et al.  Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata , 2007, Nucleic Acids Res..

[10]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[11]  Richard C. Singleton Algorithm 347: an efficient algorithm for sorting with minimal storage [M1] , 1969, CACM.

[12]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[13]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[14]  Julio R. Banga,et al.  PREMER: Parallel Reverse Engineering of Biological Networks with Information Theory , 2016, CMSB.

[15]  Mark A. Ragan,et al.  Supervised, semi-supervised and unsupervised inference of gene regulatory networks , 2013, Briefings Bioinform..

[16]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[17]  Edmund J. Crampin,et al.  Information theoretic approaches for inference of biological networks from continuous-valued data , 2016, BMC Systems Biology.

[18]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[19]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[20]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[21]  Andrea Califano,et al.  ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information , 2016, Bioinform..

[22]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[23]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[24]  P. Rapp,et al.  Statistical validation of mutual information calculations: comparison of alternative numerical algorithms. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[26]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[27]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[28]  Fabio Rinaldi,et al.  RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond , 2015, Nucleic Acids Res..

[29]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[30]  Michael Banf,et al.  Computational inference of gene regulatory networks: Approaches, limitations and opportunities. , 2017, Biochimica et biophysica acta. Gene regulatory mechanisms.

[31]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[32]  A. Ferrer,et al.  Dealing with missing data in MSPC: several methods, different interpretations, some examples , 2002 .

[33]  Julio R. Banga,et al.  Reverse engineering and identification in systems biology: strategies, perspectives and challenges , 2014, Journal of The Royal Society Interface.

[34]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..