Improving network inference algorithms using resampling methods

BackgroundRelatively small changes to gene expression data dramatically affect co-expression networks inferred from that data which, in turn, can significantly alter the subsequent biological interpretation. This error propagation is an underappreciated problem that, while hinted at in the literature, has not yet been thoroughly explored. Resampling methods (e.g. bootstrap aggregation, random subspace method) are hypothesized to alleviate variability in network inference methods by minimizing outlier effects and distilling persistent associations in the data. But the efficacy of the approach assumes the generalization from statistical theory holds true in biological network inference applications.ResultsWe evaluated the effect of bootstrap aggregation on inferred networks using commonly applied network inference methods in terms of stability, or resilience to perturbations in the underlying expression data, a metric for accuracy, and functional enrichment of edge interactions.ConclusionBootstrap aggregation results in improved stability and, depending on the size of the input dataset, a marginal improvement to accuracy assessed by each method’s ability to link genes in the same functional pathway.

[1]  Fabio Rinaldi,et al.  RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond , 2015, Nucleic Acids Res..

[2]  Frank Emmert-Streib,et al.  Bagging Statistical Network Inference from Large-Scale Gene Expression Data , 2012, PloS one.

[3]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[4]  Hyunjin Yoon,et al.  Coordinated Regulation of Virulence during Systemic Infection of Salmonella enterica Serovar Typhimurium , 2009, PLoS pathogens.

[5]  Cesare Furlanello,et al.  Stability Indicators in Network Reconstruction , 2012, PloS one.

[6]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[7]  Peter D. Karp,et al.  The EcoCyc database: reflecting new knowledge about Escherichia coli K-12 , 2016, Nucleic Acids Res..

[8]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[9]  Giorgio Valentini,et al.  Bio-molecular cancer prediction with random subspace ensembles of support vector machines , 2005, Neurocomputing.

[10]  Holger Fröhlich,et al.  Large scale statistical inference of signaling pathways from RNAi and microarray data , 2007, BMC Bioinformatics.

[11]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[13]  Xing Li,et al.  The Inferred Cardiogenic Gene Regulatory Network in the Mammalian Heart , 2013, PloS one.

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  Hongzhe Li,et al.  Co-expression networks: graph properties and topological comparisons , 2010, Bioinform..

[16]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[17]  Heping Zhang,et al.  Recursive Partitioning and Applications , 1999 .

[18]  LiHongzhe,et al.  Co-expression networks , 2010 .

[19]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[20]  Daniel C. Liebler,et al.  Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction* , 2016, Molecular & Cellular Proteomics.

[21]  João Pedro de Magalhães,et al.  Gene co-expression analysis for functional classification and gene–disease predictions , 2017, Briefings Bioinform..

[22]  Ping Lin,et al.  Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size , 2017, BMC Systems Biology.

[23]  Christopher C. Overall,et al.  Network analysis of transcriptomics expands regulatory landscapes in Synechococcus sp. PCC 7002 , 2016, Nucleic acids research.

[24]  Peter Langfelder,et al.  Weighted gene co-expression network analysis of the peripheral blood from Amyotrophic Lateral Sclerosis patients , 2009, BMC Genomics.

[25]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.