JCD-DEA: a joint covariate detection tool for differential expression analysis on tumor expression profiles

BackgroundDifferential expression analysis on tumor expression profiles has always been a key issue for subsequent biological experimental validation. It is important how to select features which best discriminate between different groups of patients. Despite the emergence of multivariate analysis approaches, prevailing feature selection methods primarily focus on multiple hypothesis testing on individual variables, and then combine them for an explanatory result. Besides, these methods, which are commonly based on hypothesis testing, view classification as a posterior validation of the selected variables.ResultsBased on previously provided A5 feature selection strategy, we develop a joint covariate detection tool for differential expression analysis on tumor expression profiles. This software combines hypothesis testing with testing according to classification results. A model selection approach based on Gaussian mixture model is introduced in for automatic selection of features. Besides, a projection heatmap is proposed for the first time.ConclusionsJoint covariate detection strengthens the viewpoint for selecting variables which are not only individually but also jointly significant. Experiments on simulation and realistic data show the effectiveness of the developed software, which enhances the reliability of joint covariate detection for differential expression analysis on tumor expression profiles. The software is available at http://bio-nefu.com/resource/jcd-dea.

[1]  G. Smyth,et al.  ROBUST HYPERPARAMETER ESTIMATION PROTECTS AGAINST HYPERVARIABLE GENES AND IMPROVES POWER TO DETECT DIFFERENTIAL EXPRESSION. , 2016, The annals of applied statistics.

[2]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Tao Huang,et al.  Model Selection for Gaussian Mixture Models , 2013, 1301.3558.

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  Roslin Russell,et al.  Microarray Technology in Practice , 2008 .

[6]  Lei Wang,et al.  Joint Covariate Detection on Expression Profiles for Identifying MicroRNAs Related to Venous Metastasis in Hepatocellular Carcinoma , 2017, Scientific Reports.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Sandrine Dudoit,et al.  Multiple Testing Procedures: the multtest Package and Applications to Genomics , 2005 .

[9]  Krista A. Zanetti,et al.  Identification of metastasis‐related microRNAs in hepatocellular carcinoma , 2008, Hepatology.

[10]  John D. Storey,et al.  SAM Thresholding and False Discovery Rates for Detecting Differential Gene Expression in DNA Microarrays , 2003 .

[11]  X. Wang,et al.  Predicting hepatitis B virus–positive metastatic hepatocellular carcinomas using gene expression profiling and supervised machine learning , 2003, Nature Medicine.

[12]  Galit Shmueli,et al.  To Explain or To Predict? , 2010 .

[13]  G. Pazour,et al.  Ror2 signaling regulates Golgi structure and transport through IFT20 for tumor invasiveness , 2017, Scientific Reports.

[14]  T. Möröy,et al.  DNA Microarrays in Medicine: Can the Promises Be Kept? , 2002, Journal of biomedicine & biotechnology.

[15]  Xudong Zhao,et al.  Joint Covariate Detection on Expression Profiles for Selecting Prognostic miRNAs in Glioblastoma , 2017, BioMed research international.

[16]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[17]  Gopal Kanji,et al.  100 Statistical Tests , 1994 .

[18]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[19]  Robert Tibshirani,et al.  Statistical Significance for Genome-Wide Experiments , 2003 .

[20]  A. Tamhane,et al.  Multiple Comparison Procedures , 1989 .

[21]  Roland Schmitz,et al.  Genetics and Pathogenesis of Diffuse Large B‐Cell Lymphoma , 2018, The New England journal of medicine.

[22]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[23]  Lei Xu,et al.  Integrative Hypothesis Test and A5 Formulation: Sample Pairing Delta, Case Control Study, and Boundary Based Statistics , 2013, IScIDE.

[24]  Lei Xu,et al.  Bi-linear matrix-variate analyses, integrative hypothesis tests, and case-control studies , 2015, Applied Informatics.

[25]  Jeffrey T Leek,et al.  A direct approach to estimating false discovery rates conditional on covariates , 2017, bioRxiv.

[26]  A. Tamhane,et al.  Multiple Comparison Procedures , 2009 .

[27]  I. Ng,et al.  Sequential alterations of microrna expression in hepatocellular carcinoma development and venous metastasis , 2012, Hepatology.

[28]  Xiaogang Chen,et al.  A Hybrid BCI speller based on the combination of EMG envelopes and SSVEP , 2015, Applied Informatics.

[29]  Qixing Huang,et al.  Use of RNAi technology to develop a PRSV-resistant transgenic papaya , 2017, Scientific Reports.