A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets

BackgroundAs an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results.ResultsA novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves.ConclusionsThe convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.

[1]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[2]  Hitoshi Iba,et al.  Reverse engineering gene regulatory network from microarray data using linear time-variant model , 2010, BMC Bioinformatics.

[3]  Fang-Xiang Wu,et al.  Identification of gene regulatory networks from time course gene expression data , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[4]  Shuhei Kimura,et al.  Genetic network inference as a series of discrimination tasks , 2009, Bioinform..

[5]  Francesco Amato,et al.  Exploiting prior knowledge and preferential attachment to infer biological interaction networks , 2009 .

[6]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[7]  Katsuhisa Horimoto,et al.  Discovery of Chemical Compound Groups with Common Structures by a Network Analysis Approach (Affinity Prediction Method) , 2011, J. Chem. Inf. Model..

[8]  Jean-Philippe Vert,et al.  TIGRESS: Trustful Inference of Gene REgulation using Stability Selection , 2012, BMC Systems Biology.

[9]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[10]  Yu-Ting Hsiao,et al.  Inferring robust gene networks from expression data by a sensitivity-based incremental evolution method , 2012, BMC Bioinformatics.

[11]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[12]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[13]  Fang-Xiang Wu,et al.  Inferring gene regulatory networks from multiple time course gene expression datasets , 2011, 2011 IEEE International Conference on Systems Biology (ISB).

[14]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[15]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[16]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[17]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[18]  Christine Nardini,et al.  An S-System Parameter Estimation Method (SPEM) for Biological Networks , 2012, J. Comput. Biol..

[19]  Vincent Frouin,et al.  Gene Association Networks from Microarray Data Using a Regularized Estimation of Partial Correlation Based on PLS Regression , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[21]  Fang-Xiang Wu,et al.  Inference of Biological S-System Using the Separable Estimation Method and the Genetic Algorithm , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  A. G. de la Fuente,et al.  From Knockouts to Networks: Establishing Direct Cause-Effect Relationships through Graph Analysis , 2010, PloS one.

[23]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[24]  Fang-Xiang Wu,et al.  Robust inference of gene regulatory networks from multiple microarray datasets , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[25]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[26]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[27]  D. Bernardo,et al.  A Yeast Synthetic Network for In Vivo Assessment of Reverse-Engineering and Modeling Approaches , 2009, Cell.

[28]  Trupti Joshi,et al.  Inferring gene regulatory networks from multiple microarray datasets , 2006, Bioinform..

[29]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.