Reconstruction of Large-Scale Gene Regulatory Networks Using Regression-based Models

Gene regulatory networks (GRN) reconstruction is the process of identifying gene regulatory interactions from experimental data through computational analysis. GRN reconstruction-related works have boosted many major discoveries in finding drug targets for the treatment of human diseases, including cancer. However, reconstructing GRNs from gene expression data is a challenging problem due to high-dimensionality and very limited number of observations data, severe multicollinearity and the tendency of generating cascade errors. These problems lead to the reduced performance of GRN inference methods, hence resulting in the method being unreliable for scientific usage. We propose a method called P-CALS (Principal Component Analysis and Partial Least Squares) that is derived from the combination of PCA (Principal Component Analysis) with PLS (Partial Least Squares). The performance of P-CALS is assessed to the genome-scale GRN of E. coli, S. cerevisiae and an in-silico datasets. We discovered that P-CALS achieved satisfactory results as all of the sub-networks from diverse datasets achieved AUROC values above 0.5 and gene relationships were discovered at the most complex network tested in the experiments.

[1]  Suhaila Zainudin,et al.  Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems , 2017, Adv. Bioinformatics.

[2]  Henry W. Altland,et al.  Regression Analysis: Statistical Modeling of a Response Variable , 1998, Technometrics.

[3]  Jeremiah J. Faith,et al.  Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata , 2007, Nucleic Acids Res..

[4]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[5]  H. Martens,et al.  Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR) , 2000 .

[6]  Shing-Chow Chan,et al.  A New Method for Preliminary Identification of Gene Regulatory Networks from Gene Microarray Cancer Data Using Ridge Partial Least Squares With Recursive Feature Elimination and Novel Brier and Occurrence Probability Measures , 2012, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[7]  Jostein Halgunset,et al.  Principal component analysis for the comparison of metabolic profiles from human rectal cancer biopsies and colorectal xenografts using high-resolution magic angle spinning 1H magnetic resonance spectroscopy , 2008, Molecular Cancer.

[8]  Qingshan Jiang,et al.  Gene regulatory network inference using PLS-based methods , 2016, BMC Bioinformatics.

[9]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[10]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[11]  Li Zhang,et al.  A Maximum A Posteriori Probability and Time-Varying Approach for Inferring Gene Regulatory Networks from Time Course Gene Microarray Data , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  ZhangLi,et al.  A maximum a posteriori probability and time-varying approach for inferring gene regulatory networks from time course gene microarray data , 2015 .

[13]  Timothy G. Rials,et al.  Analysis of preservative-treated wood by multivariate analysis of laser-induced breakdown spectroscopy spectra , 2005 .

[14]  Salman Yussof,et al.  Online Handwritten Signature Verification Using Neural Network Classifier Based on Principal Component Analysis , 2014, TheScientificWorldJournal.