Differentially expressed genes selection via Laplacian regularized low-rank representation method

With the rapid development of DNA microarray technology and next-generation technology, a large number of genomic data were generated. So how to extract more differentially expressed genes from genomic data has become a matter of urgency. Because Low-Rank Representation (LRR) has the high performance in studying low-dimensional subspace structures, it has attracted a chunk of attention in recent years. However, it does not take into consideration the intrinsic geometric structures in data. In this paper, a new method named Laplacian regularized Low-Rank Representation (LLRR) has been proposed and applied on genomic data, which introduces graph regularization into LRR. By taking full advantages of the graph regularization, LLRR method can capture the intrinsic non-linear geometric information among the data. The LLRR method can decomposes the observation matrix of genomic data into a low rank matrix and a sparse matrix through solving an optimization problem. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Therefore, the differentially expressed genes can be selected according to the sparse matrix. Finally, we use the GO tool to analyze the selected genes and compare the P-values with other methods. The results on the simulation data and two real genomic data illustrate that this method outperforms some other methods: in differentially expressed gene selection.

[1]  Alexandre d'Aspremont,et al.  Clustering and feature selection using sparse principal component analysis , 2007, ArXiv.

[2]  Vincenzo Verardi Robust principal component analysis in Stata , 2009 .

[3]  Junbin Gao,et al.  Robust face recognition via double low-rank matrix recovery for feature extraction , 2013, 2013 IEEE International Conference on Image Processing.

[4]  Nikos D. Sidiropoulos,et al.  Co-clustering as multilinear decomposition with sparse latent factors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Annalisa Astolfi,et al.  CD99 inhibits neural differentiation of human Ewing sarcoma cells and thereby contributes to oncogenesis. , 2010, The Journal of clinical investigation.

[6]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[7]  Nenghai Yu,et al.  Non-negative low rank and sparse graph for semi-supervised learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[9]  Yulong Wang,et al.  Graph-Regularized Low-Rank Representation for Destriping of Hyperspectral Images , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[10]  M. Fedora,et al.  Pancreatic stone protein - a possible biomarker of multiorgan failure and mortality in children sepsis. , 2014, Cytokine.

[11]  Yong Xu,et al.  Extracting plants core genes responding to abiotic stresses by penalized matrix decomposition , 2012, Comput. Biol. Medicine.

[12]  Zhenyue Zhang,et al.  Low-Rank Matrix Approximation with Manifold Regularization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Annarita D'Addabbo,et al.  SVD Based Feature Selection and Sample Classification of Proteomic Data , 2008, KES.

[15]  Jieqing Zhu,et al.  Prolyl-4-hydroxylase α subunit 2 promotes breast cancer progression and metastasis by regulating collagen deposition , 2014, BMC Cancer.

[16]  Shuicheng Yan,et al.  Exact Subspace Segmentation and Outlier Detection by Low-Rank Representation , 2012, AISTATS.

[17]  F. Fauvel-Lafève,et al.  Inhibition of platelets and tumor cell adhesion by the disintegrin domain of human ADAM9 to collagen I under dynamic flow conditions. , 2009, Biochimie.

[18]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[20]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[21]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  N. Kuzushita,et al.  Serum fucosylated haptoglobin in chronic liver diseases as a potential biomarker of hepatocellular carcinoma development , 2015, Clinical chemistry and laboratory medicine.

[23]  Zhixun Su,et al.  Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation , 2011, NIPS.

[24]  Xuezhong Zhang,et al.  Association of fibronectin Msp iv polymorphism and diabetic nephropathy susceptibility in Chinese Han population. , 2015, International journal of clinical and experimental pathology.

[25]  Philippe Besse,et al.  Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems , 2011, BMC Bioinformatics.

[26]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[27]  Yu-Chiang Frank Wang,et al.  Low-rank matrix recovery with structural incoherence for robust face recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Zhenran Wang,et al.  Clinicopathological Significance of CDKN2A Promoter Hypermethylation Frequency with Pancreatic Cancer , 2015, Scientific Reports.

[29]  M. Nalls,et al.  Consumption of meat is associated with higher fasting glucose and insulin concentrations regardless of glucose and insulin genetic risk scores: a meta-analysis of 50,345 Caucasians. , 2015, The American journal of clinical nutrition.

[30]  Hang Li,et al.  Inequality-Constrained RPCA for Shadow Removal and Foreground Detection , 2015, IEICE Trans. Inf. Syst..

[31]  G. Church,et al.  Genomic sequencing. , 1993, Methods in molecular biology.

[32]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[33]  Jian Yang,et al.  Sparse two-dimensional local discriminant projections for feature extraction , 2011, Neurocomputing.

[34]  I. Dotan,et al.  Serum Alpha-1 Antitrypsin: A Noninvasive Marker of Pouchitis , 2015, Inflammatory bowel diseases.

[35]  Victor Trevino,et al.  Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm , 2010, Comput. Biol. Chem..

[36]  Yong Xu,et al.  Robust PCA based method for discovering differentially expressed genes , 2013, BMC Bioinformatics.

[37]  Junbin Gao,et al.  Laplacian Regularized Low-Rank Representation and Its Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Alessandro Conti,et al.  KRAS mutation status is associated with specific pattern of genes expression in pancreatic adenocarcinoma. , 2015, Future oncology.

[39]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.

[40]  P. Peltomäki,et al.  Promoter‐specific alterations of APC are a rare cause for mutation‐negative familial adenomatous polyposis , 2014, Genes, chromosomes & cancer.

[41]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[42]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[43]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[44]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[45]  Sharmeela Kaushal,et al.  KRas induces a Src/PEAK1/ErbB2 kinase amplification loop that drives metastatic growth and therapy resistance in pancreatic cancer. , 2012, Cancer research.

[46]  M. Iraburu,et al.  Fibronectin Peptides as Potential Regulators of Hepatic Fibrosis Through Apoptosis of Hepatic Stellate Cells , 2015, Journal of cellular physiology.

[47]  Bin Fang,et al.  Extracting sparse error of robust PCA for face recognition in the presence of varying illumination and occlusion , 2014, Pattern Recognit..

[48]  Aggelos K. Katsaggelos,et al.  Sparse Bayesian Methods for Low-Rank Matrix Estimation , 2011, IEEE Transactions on Signal Processing.

[49]  R. Xu,et al.  Reduced expression of p21-activated protein kinase 1 correlates with poor histological differentiation in pancreatic cancer , 2014, BMC Cancer.

[50]  Victor Vianu,et al.  Invited articles section foreword , 2010, JACM.

[51]  R. Klemke,et al.  A hypusine-eIF5A-PEAK1 switch regulates the pathogenesis of pancreatic cancer. , 2014, Cancer research.