RDCurve: A Nonparametric Method to Evaluate the Stability of Ranking Procedures

Great concerns have been raised about the reproducibility of gene signatures based on high-throughput techniques such as microarray. Studies analyzing similar samples often report poorly overlapping results, and the p-value usually lacks biological context. We propose a nonparametric ReDiscovery Curve (RDCurve) method, to estimate the frequency of rediscovery of gene signature identified. Given a ranking procedure and a data set with replicated measurements, the RDCurve bootstraps the data set and repeatedly applies the ranking procedure, selects a subset of k important genes, and estimates the probability of rediscovery of the selected subset of genes. We also propose a permutation scheme to estimate the confidence band under the Null hypothesis for the significance of the RDCurve. The method is nonparametric and model-independent. With the RDCurve, we can assess the signal-to-noise ratio of the data, compare the performance of ranking procedures in term of their expected rediscovery rates, and choose the number of genes to be reported.

[1]  S. Shousha,et al.  Primary tumour characteristics and axillary lymph node status in breast cancer , 1999, British Journal of Cancer.

[2]  Xin Lu,et al.  Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures , 2007, BMC Bioinformatics.

[3]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Zhijin Wu,et al.  Preprocessing of oligonucleotide array data , 2004, Nature Biotechnology.

[5]  Alexander Gordon,et al.  Control of the mean number of false discoveries, Bonferroni and stability of multiple testing , 2007, 0709.0366.

[6]  Nir Friedman,et al.  Comparative analysis of algorithms for signal quantitation from oligonucleotide microarrays , 2004, Bioinform..

[7]  R. Lempicki,et al.  Evaluation of gene expression measurements from commercial microarray platforms. , 2003, Nucleic acids research.

[8]  Eliot Marshall,et al.  Getting the Noise Out of Gene Arrays , 2004, Science.

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[11]  Xing Qiu,et al.  Assessing stability of gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[12]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Shridar Ganesan,et al.  X chromosomal abnormalities in basal-like human breast cancer. , 2006, Cancer cell.

[14]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[15]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[16]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Hongyu Zhao,et al.  Parametric and Nonparametric FDR Estimation Revisited , 2006, Biometrics.

[20]  Edward R. Dougherty,et al.  Is cross-validation better than resubstitution for ranking genes? , 2004, Bioinform..

[21]  Sheng Zhong,et al.  Reproducibility Probability Score—incorporating measurement variability across laboratories for gene selection , 2006, Nature Biotechnology.

[22]  Xuegong Zhang,et al.  Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data , 2006, BMC Bioinformatics.

[23]  Xing Qiu,et al.  Some Comments on Instability of False Discovery Rate Estimation , 2006, J. Bioinform. Comput. Biol..

[24]  Catalin C. Barbacioru,et al.  Evaluation of DNA microarray results with quantitative gene expression platforms , 2006, Nature Biotechnology.

[25]  Xiaochun Li,et al.  A Comparison of Parametric Versus Permutation Methods with Applications to General and Temporal Microarray Gene Expression Data , 2003, Bioinform..

[26]  M. Silverstein,et al.  Predictors of axillary lymph node metastases in patients with T1 breast carcinoma. , 1997, Cancer.

[27]  Xuesong Lu,et al.  Significance of Gene Ranking for Classification of Microarray Samples , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  M. Silverstein,et al.  Predicting Axillary Nodal Positivity in 2282 Patients with Breast Carcinoma , 2001, World Journal of Surgery.

[29]  Xing Qiu,et al.  Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes , 2005, Statistical applications in genetics and molecular biology.

[30]  Stephen C. Harris,et al.  Rat toxicogenomic study reveals analytical consistency across microarray platforms , 2006, Nature Biotechnology.

[31]  Xuesong Lu,et al.  Predicting features of breast cancer with gene expression patterns , 2008, Breast Cancer Research and Treatment.