Lasso logistic regression based approach for extracting plants coregenes responding to abiotic stresses

Sparse methods have a significant advantage of reducing gene expression data complexity to make them comprehensible and interpretable. In this paper, based on Lasso Logistic Regression (LLR), we propose a novel approach to extract plant characteristic gene set, namely coregenes, responding to abiotic stresses. Firstly, to obtain the regression coefficients, the lasso logistic regression was performed according to the samples. Then, the regression coefficients were sorted by the absolute value of them. Finally, the corresponding genes of the nonzero entries of the coefficients are selected as the coregene. Each of coregene extracted can capture the changes of the samples belong to the same condition. The experimental results show that the proposed LLR-based method is efficient to extract the coregenes concerning straight with the stresses.

[1]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[2]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[3]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[4]  Simon C. K. Shiu,et al.  Inferring the Transcriptional Modules Using Penalized Matrix Decomposition , 2010, ICIC.

[5]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[6]  Erik Strumbelj,et al.  An Efficient Explanation of Individual Classifications using Game Theory , 2010, J. Mach. Learn. Res..

[7]  R Y Tsien,et al.  Alteration of stimulus-specific guard cell calcium oscillations and stomatal closing in Arabidopsis det3 mutant. , 2000, Science.

[8]  Qi Zhu,et al.  Breast cancer diagnosis based on a kernel orthogonal transform , 2011, Neural Computing and Applications.

[9]  Anestis Antoniadis,et al.  Statistical Applications in Genetics and Molecular Biology Lasso Logistic Regression , GSoft and the Cyclic Coordinate Descent Algorithm : Application to Gene Expression Data , 2011 .

[10]  E. Bornberg-Bauer,et al.  The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. , 2007, The Plant journal : for cell and molecular biology.

[11]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[12]  Alexandre d'Aspremont,et al.  Clustering and feature selection using sparse principal component analysis , 2007, ArXiv.

[13]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[14]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.