Gene Extraction Based on Sparse Singular Value Decomposition

In this paper, we develop a new feature extraction method based on sparse singular value decomposition (SSVD). We apply SSVD algorithm to select the characteristic genes from Colorectal Cancer (CRC) genomic dataset, and then the differentially expressed genes obtained are evaluated by the tools based on Gene Ontology. As a gene extraction method, SSVD is also compared with some existing feature extraction methods such as independent component analysis (ICA), the p-norm robust feature extraction (PREE) and sparse principal component analysis (SPCA). The experimental results show that SSVD method outperforms the existing algorithms.

[1]  Jin-Xing Liu,et al.  A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes , 2015, PloS one.

[2]  Yong Xu,et al.  Extracting plants core genes responding to abiotic stresses by penalized matrix decomposition , 2012, Comput. Biol. Medicine.

[3]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[4]  Jianhua Z. Huang,et al.  Biclustering via Sparse Singular Value Decomposition , 2010, Biometrics.

[5]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[6]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[7]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[8]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[9]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[10]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[11]  Woojoo Lee,et al.  Super-sparse principal component analyses for high-throughput genomic data , 2010, BMC Bioinformatics.

[12]  Lei Zhang,et al.  Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection , 2009, IEEE Transactions on Information Technology in Biomedicine.

[13]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[14]  Yong Xu,et al.  Robust PCA based method for discovering differentially expressed genes , 2013, BMC Bioinformatics.

[15]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[16]  E. Bornberg-Bauer,et al.  The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. , 2007, The Plant journal : for cell and molecular biology.

[17]  H. V. Jagadish,et al.  ConceptGen: a gene set enrichment and gene set relation mapping tool , 2010, Bioinform..