Avoiding pitfalls in L1-regularised inference of gene networks.

Statistical regularisation methods such as LASSO and related L1 regularised regression methods are commonly used to construct models of gene regulatory networks. Although they can theoretically infer the correct network structure, they have been shown in practice to make errors, i.e. leave out existing links and include non-existing links. We show that L1 regularisation methods typically produce a poor network model when the analysed data are ill-conditioned, i.e. the gene expression data matrix has a high condition number, even if it contains enough information for correct network inference. However, the correct structure of network models can be obtained for informative data, data with such a signal to noise ratio that existing links can be proven to exist, when these methods fail, by using least-squares regression and setting small parameters to zero, or by using robust network inference, a recent method taking the intersection of all non-rejectable models. Since available experimental data sets are generally ill-conditioned, we recommend to check the condition number of the data matrix to avoid this pitfall of L1 regularised inference, and to also consider alternative methods.

[1]  C. Sander,et al.  Models from experiments: combinatorial drug perturbations of cancer cells , 2008, Molecular systems biology.

[2]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[3]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Sean C. Warnick,et al.  Robust dynamical network structure reconstruction , 2011, Autom..

[5]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[6]  Erik L. L. Sonnhammer,et al.  Optimal Sparsity Criteria for Network Inference , 2013, J. Comput. Biol..

[7]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[8]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[9]  Ralf Herwig,et al.  Reconstruction and validation of gene regulatory networks with neural networks , 2007 .

[10]  N. Lytkin,et al.  A comprehensive assessment of methods for de-novo reverse-engineering of genome-scale regulatory networks. , 2011, Genomics.

[11]  Zoubin Ghahramani,et al.  Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[12]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[13]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[14]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[15]  Zhijin Wu,et al.  Exploration, visualization, and preprocessing of high-dimensional data. , 2010, Methods in molecular biology.

[16]  Pascal Kahlem,et al.  The challenges of systems biology. Preface. , 2009, Annals of the New York Academy of Sciences.

[17]  Christopher A. Penfold,et al.  How to infer gene networks from expression profiles, revisited , 2011, Interface Focus.

[18]  J. Collins,et al.  A network biology approach to aging in yeast , 2009, Proceedings of the National Academy of Sciences.

[19]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[20]  Torbjörn E. M. Nordling,et al.  Interampatteness - a generic property of biochemical networks. , 2009, IET systems biology.

[21]  Madhu Chetty,et al.  Issues impacting genetic network reverse engineering algorithm validation using small networks. , 2012, Biochimica et biophysica acta.

[22]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[23]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[24]  Erik L. L. Sonnhammer,et al.  Functional association networks as priors for gene regulatory network inference , 2014, Bioinform..

[25]  J. Tegnér,et al.  Perturbations to uncover gene networks. , 2007, Trends in genetics : TIG.

[26]  D. Bernardo,et al.  A Yeast Synthetic Network for In Vivo Assessment of Reverse-Engineering and Modeling Approaches , 2009, Cell.

[27]  Torbjörn E. M. Nordling,et al.  Network modeling of the transcriptional effects of copy number aberrations in glioblastoma , 2011, Molecular systems biology.

[28]  Richard Bonneau,et al.  Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks , 2013, Bioinform..

[29]  R. Quatrano Genomics , 1998, Plant Cell.

[30]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[31]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[32]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[33]  Jesper Tegnér,et al.  Reverse engineering gene networks using singular value decomposition and robust regression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Stephen P. Boyd,et al.  Inferring stable genetic networks from steady-state data , 2011, Autom..

[35]  Victor Chew,et al.  Confidence, Prediction, and Tolerance Regions for the Multivariate Normal Distribution , 1966 .