Efficient Boolean Modeling of Gene Regulatory Networks via Random Forest Based Feature Selection and Best-Fit Extension

Gene regulatory networks play a critical role in cellular behavior and decision making. Mathematical modeling of gene regulatory networks can help unravel the complexity of gene regulation and provide deep insights into key biological processes at the cellular level. In this paper, we focus on building Boolean models for gene regulatory networks from time series gene expression data. Since the two classic methods, REVEAL and Best-Fit Extension, are both computationally expensive and cannot scale well for large networks, we propose a novel hybrid approach combining the feature selection technique based on random forest and the Best-Fit Extension algorithm. The feature selection step can effectively rule out most of the incorrect candidate regulators, and thereby can significantly decrease the workload of the subsequent Best-Fit Extension fitting procedure. The efficiency and performance of the proposed two-stage framework are analyzed theoretically and validated comprehensively with synthetic datasets generated by the core regulatory network active in myeloid differentiation.

[1]  Rui-Sheng Wang,et al.  Boolean modeling in systems biology: an overview of methodology and applications , 2012, Physical biology.

[2]  S. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[3]  Satoru Miyano,et al.  Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model , 1998, Pacific Symposium on Biocomputing.

[4]  Natalie Berestovsky,et al.  An Evaluation of Methods for Inferring Boolean Networks from Time-Series Data , 2013, PloS one.

[5]  Jessica Andrea Carballido,et al.  Discretization of gene expression data revised , 2016, Briefings Bioinform..

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[8]  Ilya Shmulevich,et al.  On Learning Gene Regulatory Networks Under the Boolean Network Model , 2003, Machine Learning.

[9]  E. McCluskey Minimization of Boolean functions , 1956 .

[10]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[11]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[12]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[13]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[14]  N LeNovère Quantitative and logic modelling of molecular and gene networks. , 2015 .

[15]  Fabian J Theis,et al.  Hierarchical Differentiation of Myeloid Progenitors Is Encoded in the Transcription Factor Network , 2011, PloS one.