Space-log: a novel approach to inferring gene-gene net-works using SPACE model with log penalty.

Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1  (lasso), L2  (ridge), or elastic net penalty, which spans the range of L1  to L2  penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0  and L1  penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log . We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs. Space-log is open source and available at GitHub, https://github.com/wuqian77/SpaceLog

[1]  S. Pandey,et al.  What Are Degrees of Freedom , 2008 .

[2]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[3]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[4]  Wei Sun,et al.  PenPC: A two‐step approach to estimate the skeletons of high‐dimensional directed acyclic graphs , 2014, Biometrics.

[5]  Gang Wu,et al.  Identification of Hub Genes and Key Pathways Associated with Two Subtypes of Diffuse Large B-Cell Lymphoma Based on Gene Expression Profiling via Integrated Bioinformatics , 2018, BioMed research international.

[6]  Eleazar Eskin,et al.  Local genetic effects on gene expression across 44 human tissues , 2016, bioRxiv.

[7]  Wei Sun,et al.  Designing penalty functions in high dimensional problems: The role of tuning parameters. , 2016, Electronic journal of statistics.

[8]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[9]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[10]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[11]  J. Ibrahim,et al.  Genomewide Multiple-Loci Mapping in Experimental Crosses by Iterative Adaptive Penalized Regression , 2010, Genetics.

[12]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[13]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[14]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[17]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[18]  Zehua Chen,et al.  EXTENDED BIC FOR SMALL-n-LARGE-P SPARSE GLM , 2012 .

[19]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..