Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization

RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher’s discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes’ weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher’s criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.

[1]  Desmond J. Higham,et al.  Simultaneous Non-Negative Matrix Factorization for Multiple Large Scale Gene Expression Datasets in Toxicology , 2012, PloS one.

[2]  Yunde Jia,et al.  FISHER NON-NEGATIVE MATRIX FACTORIZATION FOR LEARNING LOCAL FEATURES , 2004 .

[3]  堀内 哲吉 Role of potassium channels in regulation of brain arteriolar tone : Comparison of cerebrum versus brain stem , 2001 .

[4]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[5]  Shun-ichi Amari,et al.  A new discriminant NMF algorithm and its application to the extraction of subtle emotional differences in speech , 2012, Cognitive Neurodynamics.

[6]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Michael W. Berry,et al.  Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization) , 2010, BMC Bioinformatics.

[8]  R G Dacey,et al.  Role of Potassium Channels in Regulation of Brain Arteriolar Tone: Comparison of Cerebrum Versus Brain Stem , 2001, Stroke.

[9]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[10]  Daphne Koller,et al.  Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge , 2013, PloS one.

[11]  Soo-Young Lee,et al.  Spectral Feature Extraction Using dNMF for Emotion Recognition in Vowel Sounds , 2013, ICONIP.

[12]  Martin Vingron,et al.  R2KS: A Novel Measure for Comparing Gene Expression Based on Ranked Gene Lists , 2012, J. Comput. Biol..

[13]  Xiaohua Hu,et al.  Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data , 2011, BMC Bioinformatics.

[14]  Yu-Jin Zhang,et al.  Nonnegative Matrix Factorization: A Comprehensive Review , 2013, IEEE Transactions on Knowledge and Data Engineering.

[15]  Anastasios Tefas,et al.  Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification , 2006, IEEE Transactions on Neural Networks.

[16]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[18]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[19]  Ioannis Pitas,et al.  A Novel Discriminant Non-Negative Matrix Factorization Algorithm With Applications to Facial Image Characterization Problems , 2007, IEEE Transactions on Information Forensics and Security.

[20]  T. Bliss,et al.  A synaptic model of memory: long-term potentiation in the hippocampus , 1993, Nature.

[21]  R. Tibshirani,et al.  Normalization, testing, and false discovery rate estimation for RNA-sequencing data. , 2012, Biostatistics.

[22]  Qian Wang,et al.  GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-seq data , 2012, Bioinform..

[23]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[24]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[25]  Catalin C. Barbacioru,et al.  Evaluation of DNA microarray results with quantitative gene expression platforms , 2006, Nature Biotechnology.

[26]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[27]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[28]  V. Cheung,et al.  Non-negative matrix factorization algorithms modeling noise distributions within the exponential family , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[29]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[30]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[31]  Xing-Ming Zhao,et al.  jNMFMA: a joint non-negative matrix factorization meta-analysis of transcriptomics data , 2015, Bioinform..

[32]  Renaud Gaujoux,et al.  A flexible R package for nonnegative matrix factorization , 2010, BMC Bioinformatics.

[33]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[34]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.