Nonconvex Penalty Based Low-Rank Representation and Sparse Regression for eQTL Mapping

This paper addresses the problem of accounting for confounding factors and expression quantitative trait loci (eQTL) mapping in the study of SNP-gene associations. The existing convex penalty based algorithm has limited capacity to keep main information of matrix in the process of reducing matrix rank. We present an algorithm, which use nonconvex penalty based low-rank representation to account for confounding factors and make use of sparse regression for eQTL mapping (NCLRS). The efficiency of the presented algorithm is evaluated by comparing the results of 18 synthetic datasets given by NCLRS and presented algorithm, respectively. The experimental results or biological dataset show that our approach is an effective tool to account for non-genetic effects than currently existing methods.

[1]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[2]  MengChu Zhou,et al.  Highly Efficient Framework for Predicting Interactions Between Proteins. , 2017, IEEE transactions on cybernetics.

[3]  Beth Holloway,et al.  Genome-wide expression quantitative trait loci (eQTL) analysis in maize , 2011, BMC Genomics.

[4]  D.-S. Huang,et al.  Radial Basis Probabilistic Neural Networks: Model and Application , 1999, Int. J. Pattern Recognit. Artif. Intell..

[5]  Xiaobo Zhou,et al.  Systemic modeling myeloma-osteoclast interactions under normoxic/hypoxic condition using a novel computational approach , 2015, Scientific Reports.

[6]  Simon C. K. Shiu,et al.  Molecular Pattern Discovery Based on Penalized Matrix Decomposition , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  J. Friedman Fast sparse regression and classification , 2012 .

[8]  De-Shuang Huang,et al.  Predicting Hub Genes Associated with Cervical Cancer through Gene Co-Expression Networks , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Zhu-Hong You,et al.  t-LSE: A Novel Robust Geometric Approach for Modeling Protein-Protein Interaction Networks , 2013, PloS one.

[10]  Donald Geman,et al.  Nonlinear image recovery with half-quadratic regularization , 1995, IEEE Trans. Image Process..

[11]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[12]  Zhen Wang,et al.  SFAPS: An R package for structure/function analysis of protein sequences based on informational spectrum method , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[13]  De-Shuang Huang,et al.  A General CPL-AdS Methodology for Fixing Dynamic Parameters in Dual Environments , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[16]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[17]  De-Shuang Huang,et al.  NMFBFS: A NMF-Based Feature Selection Method in Identifying Pivotal Clinical Symptoms of Hepatocellular Carcinoma , 2015, Comput. Math. Methods Medicine.

[18]  De-Shuang Huang,et al.  ChIP-PIT: Enhancing the Analysis of ChIP-Seq Data Using Convex-Relaxed Pair-Wise Interaction Tensor Decomposition , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  De-Shuang Huang,et al.  A Constructive Hybrid Structure Optimization Methodology for Radial Basis Probabilistic Neural Networks , 2008, IEEE Transactions on Neural Networks.

[20]  Lei Zhang,et al.  Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. , 2014, Current protein & peptide science.

[21]  L. Kruglyak,et al.  Gene–Environment Interaction in Yeast Gene Expression , 2008, PLoS biology.

[22]  De-Shuang Huang,et al.  A Two-Stage Geometric Method for Pruning Unreliable Links in Protein-Protein Networks , 2015, IEEE Transactions on NanoBioscience.

[23]  Lin Wang,et al.  Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping , 2013, Bioinform..

[24]  Armando Manduca,et al.  Highly Undersampled Magnetic Resonance Image Reconstruction via Homotopic $\ell_{0}$ -Minimization , 2009, IEEE Transactions on Medical Imaging.

[25]  R. Tibshirani,et al.  A fused lasso latent feature model for analyzing multi-sample aCGH data. , 2011, Biostatistics.

[26]  Wei Cheng,et al.  Graph-regularized dual Lasso for robust eQTL mapping , 2014, Bioinform..

[27]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[28]  James T. L. Mah,et al.  A Gentle Introduction to SNP Analysis: Resources and Tools , 2007, J. Bioinform. Comput. Biol..

[29]  Kung-Sik Chan,et al.  Reduced rank regression via adaptive nuclear norm penalization. , 2012, Biometrika.

[30]  De-Shuang Huang,et al.  Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks , 2015, BMC Genomics.

[31]  De-Shuang Huang,et al.  Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Tsun-Po Yang,et al.  Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies , 2010, Bioinform..

[33]  Zhihua Zhang,et al.  A Feasible Nonconvex Relaxation Approach to Feature Selection , 2011, AAAI.

[34]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[35]  Yoshihiro Ugawa,et al.  Plant cis-acting regulatory DNA elements (PLACE) database: 1999 , 1999, Nucleic Acids Res..

[36]  Wei Cheng,et al.  Fast and robust group-wise eQTL mapping using sparse graphical models , 2015, BMC Bioinformatics.

[37]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[38]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[39]  Lei Zhang,et al.  Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection , 2009, IEEE Transactions on Information Technology in Biomedicine.

[40]  David Heckerman,et al.  Correction for hidden confounders in the genetic analysis of gene expression , 2010, Proceedings of the National Academy of Sciences.

[41]  Hans-Jürgen Thiesen,et al.  Cis- and trans-acting gene regulation is associated with osteoarthritis. , 2006, American journal of human genetics.

[42]  Stéphane Canu,et al.  Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[43]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[44]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[45]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[46]  Jason G. Mezey,et al.  HEFT: eQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors , 2014, Bioinform..