A generalized framework for controlling FDR in gene regulatory network inference

Motivation Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method‐specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied. Results To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences. Availability and implementation https://bitbucket.org/sonnhammergrni/genespider/src/NB/%2BMethods/NestBoot.m

[1]  Torbjörn E. M. Nordling Robust inference of gene regulatory networks : System properties, variable selection, subnetworks, and design of experiments , 2013 .

[2]  Mario L. Arrieta-Ortiz,et al.  An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network , 2015, Molecular systems biology.

[3]  V. Thorsson,et al.  Discovery of regulatory interactions through perturbation: inference and experimental design. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[4]  Jean-Philippe Vert,et al.  TIGRESS: Trustful Inference of Gene REgulation using Stability Selection , 2012, BMC Systems Biology.

[5]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[6]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[7]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[8]  Erik L. L. Sonnhammer,et al.  Optimal Sparsity Criteria for Network Inference , 2013, J. Comput. Biol..

[9]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[10]  Torbjörn E. M. Nordling,et al.  Avoiding pitfalls in L1-regularised inference of gene networks. , 2015, Molecular bioSystems.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Jie Peng,et al.  BOOTSTRAP INFERENCE FOR NETWORK CONSTRUCTION WITH AN APPLICATION TO A BREAST CANCER MICROARRAY STUDY. , 2011, The annals of applied statistics.

[13]  D. V. Hinkley,et al.  Importance sampling and the nested bootstrap , 1989 .

[14]  J. Collins,et al.  A network biology approach to aging in yeast , 2009, Proceedings of the National Academy of Sciences.

[15]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[16]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[17]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[18]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[19]  Torbjörn E. M. Nordling,et al.  GeneSPIDER - gene regulatory network inference benchmarking with controlled network and data properties. , 2017, Molecular bioSystems.

[20]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[21]  Sijian Wang,et al.  RANDOM LASSO. , 2011, The annals of applied statistics.

[22]  J. Tegnér,et al.  Perturbations to uncover gene networks. , 2007, Trends in genetics : TIG.

[23]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.