PEAK: Integrating Curated and Noisy Prior Knowledge in Gene Regulatory Network Inference

With abundance of biological data, computational prediction of gene regulatory networks (GRNs) from gene expression data has become more feasible. Although incorporating other prior knowledge (PK), along with gene expression data, greatly improves prediction accuracy, the overall accuracy is still low. PK in GRN inference can be categorized into noisy and curated. In noisy PK, relations between genes do not necessarily correspond to regulatory relations and are thus considered inaccurate by inference algorithms such as transcription factor binding and protein-protein interactions. In contrast, curated PK is experimentally verified regulatory interactions in pathway databases. An issue in real data is that gene expression can poorly support the curated PK and thus most existing prediction algorithms cannot use these curated PK. Although several algorithms were proposed to incorporate noisy PK, none address curated PK with poor gene expression support. We present PEAK, a system to integrate both curated and noisy PK in GRN inference, especially with poor gene expression support. We introduce a novel method for GRN inference, CurInf, to effectively integrate curated PK, even when the gene expression data poorly support the PK. PEAK also uses the previously proposed method Modified Elastic Net to incorporate noisy PK, and we call it NoisInf. In our experiment, CurInf significantly incorporates curated PK, which was regarded as noise by previous methods. Using 100% curated PK, CurInf improves the area under precision-recall curve accuracy score over NoisInf by 27.3% in synthetic data, 86.5% in Escherichia coli data, and 31.1% in Saccharomyces cerevisiae data. Moreover, even when the noise in PK is 10 times more than true PK, PEAK performs better than inference without any PK. Better integration of curated PK helps biologists benefit from verified experimental data to predict more reliable GRN.

[1]  Richard Bonneau,et al.  DREAM3: Network Inference Using Dynamic Context Likelihood of Relatedness and the Inferelator , 2010, PloS one.

[2]  Cengizhan Ozturk,et al.  Bayesian network prior: network analysis of biological data using external knowledge , 2013, Bioinform..

[3]  Halil Kilicoglu,et al.  Augmenting Microarray Data with Literature-Based Knowledge to Enhance Gene Regulatory Network Inference , 2014, PLoS Comput. Biol..

[4]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.

[5]  Fabio Rinaldi,et al.  RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond , 2015, Nucleic Acids Res..

[6]  D. Husmeier,et al.  Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge , 2007, Statistical applications in genetics and molecular biology.

[7]  Christopher A. Penfold,et al.  How to infer gene networks from expression profiles, revisited , 2011, Interface Focus.

[8]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[9]  Erik L. L. Sonnhammer,et al.  Functional association networks as priors for gene regulatory network inference , 2014, Bioinform..

[10]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[11]  Charlie Hodgman,et al.  Inference of Gene Regulatory Networks Using Boolean-Network Inference Methods , 2009, J. Bioinform. Comput. Biol..

[12]  Adrian E. Raftery,et al.  Integrating external biological knowledge in the construction of regulatory networks from time-series expression data , 2012, BMC Systems Biology.

[13]  Haiyan Huang,et al.  Review on statistical methods for gene network reconstruction using expression data. , 2014, Journal of theoretical biology.

[14]  K. Jung,et al.  Identification of ArgP and Lrp as Transcriptional Regulators of lysP, the Gene Encoding the Specific Lysine Permease of Escherichia coli , 2011, Journal of bacteriology.

[15]  Min Zou,et al.  A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data , 2005, Bioinform..

[16]  Robert G. Martin,et al.  Promoter Discrimination at Class I MarA Regulon Promoters Mediated by Glutamic Acid 89 of the MarA Transcriptional Activator of Escherichia coli , 2010, Journal of bacteriology.

[17]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[18]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[19]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[20]  Jean-Philippe Vert,et al.  TIGRESS: Trustful Inference of Gene REgulation using Stability Selection , 2012, BMC Systems Biology.

[21]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[22]  Zhi-Ping Liu,et al.  Reverse Engineering of Genome-wide Gene Regulatory Networks from Gene Expression Data , 2015, Current genomics.

[23]  Mohammed Al-Shalalfa,et al.  Influence of Prior Knowledge in Constraint-Based Learning of Gene Regulatory Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Richard Bonneau,et al.  Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks , 2013, Bioinform..

[25]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[26]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[27]  Jimmy Omony,et al.  Biological Network Inference: A Review of Methods and Assessment of Tools and Techniques , 2014 .