Optimal Sparsity Criteria for Network Inference

Gene regulatory network inference (that is, determination of the regulatory interactions between a set of genes) provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/regularization coefficient, which we call ζ (zeta), to determine the degree of sparsity of the network estimates, that is, the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular, for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of ζ. In order to avoid such poor choices, we propose a method for optimization of ζ, which maximizes the accuracy of the inferred network for any sparsity-dependent inference method and data set. Our procedure is based on leave-one-out cross-optimization and selection of the ζ value that minimizes the prediction error. We also illustrate the adverse effects of noise, few samples, and uninformative experiments on network inference as well as our method for optimization of ζ. We demonstrate that our ζ optimization method for two widely used inference algorithms--Glmnet and NIR--gives accurate and informative estimates of the network structure, given that the data is informative enough.

[1]  Jianqing Fan,et al.  NETWORK EXPLORATION VIA THE ADAPTIVE LASSO AND SCAD PENALTIES. , 2009, The annals of applied statistics.

[2]  George J. Pappas,et al.  Genetic network identification using convex programming. , 2009, IET systems biology.

[3]  Torbjörn E. M. Nordling,et al.  Network modeling of the transcriptional effects of copy number aberrations in glioblastoma , 2011, Molecular systems biology.

[4]  J. Collins,et al.  A network biology approach to aging in yeast , 2009, Proceedings of the National Academy of Sciences.

[5]  Michael Hecker,et al.  Gene regulatory network inference: Data integration in dynamic models - A review , 2009, Biosyst..

[6]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[7]  Ralf Herwig,et al.  GeNGe: systematic generation of gene regulatory networks , 2009, Bioinform..

[8]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[9]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[10]  Ralf Herwig,et al.  Reverse Engineering of Gene Regulatory Networks: A Comparative Study , 2009, EURASIP J. Bioinform. Syst. Biol..

[11]  Y. Selen,et al.  Model-order selection: a review of information criterion rules , 2004, IEEE Signal Processing Magazine.

[12]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[13]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[14]  Tatsuya Akutsu,et al.  Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features , 2011, BMC Bioinformatics.

[15]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[16]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[17]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[18]  Diego di Bernardo,et al.  Robust Identification of Large Genetic Networks , 2003, Pacific Symposium on Biocomputing.

[19]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[20]  Yiannis Kourmpetis,et al.  Gene Regulatory Networks from Multifactorial Perturbations Using Graphical Lasso: Application to the DREAM4 Challenge , 2010, PloS one.

[21]  Feng Q. He,et al.  Reverse engineering and verification of gene networks: principles, assumptions, and limitations of present methods and future perspectives. , 2009, Journal of biotechnology.

[22]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[23]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[24]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[25]  S. D. Chatterji Proceedings of the International Congress of Mathematicians , 1995 .

[26]  Heng Lian Shrinkage tuning parameter selection in precision matrices estimation , 2009 .

[27]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[28]  Torbjörn E. M. Nordling,et al.  Interampatteness - a generic property of biochemical networks. , 2009, IET systems biology.

[29]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[30]  Nadia Lalam,et al.  Statistical Applications in Genetics and Molecular Biology , 2007 .