A multi-index ROC-based methodology for high throughput experiments in gene discovery

We address the problem of ranking differentially expressed genes in high throughput experiments using Receiver Operating Characteristic (ROC) curves. As it is generally unknown whether large expression values constitute 'positive' or 'negative' results or which group is 'healthy' or 'diseased', we generate four ROC curves per gene. We then consider classification indices based on all or part of the four ROC curves and identify genes ranked low by the area under the curve (AUC) but high by at least one alternative index, invariably resulting to the discovery of genes that would otherwise be missed by the AUC index.

[1]  R. Farias-Eisner,et al.  The Chemistry and Tumoricidal Activity of Nitric Oxide/Hydrogen Peroxide and the Implications to Cell Resistance/Susceptibility (*) , 1996, The Journal of Biological Chemistry.

[2]  Marco Muselli,et al.  Not proper ROC curves as new tool for the analysis of differentially expressed genes in microarray experiments , 2008, BMC Bioinformatics.

[3]  A. Chambers,et al.  The Role of Osteopontin in Breast Cancer: Clinical and Experimental Studies , 2001, Journal of Mammary Gland Biology and Neoplasia.

[4]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[5]  S. Wingren,et al.  Polymorphism in the manganese superoxide dismutase (MnSOD) gene and risk of breast cancer in young women , 2005, Journal of Cancer Research and Clinical Oncology.

[6]  B. Wold,et al.  p62 overexpression in breast tumors and regulation by prostate-derived Ets factor in breast cancer cells , 2003, Oncogene.

[7]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[8]  M. Piccart-Gebhart,et al.  Taxanes: optimizing adjuvant chemotherapy for early-stage breast cancer , 2010, Nature Reviews Clinical Oncology.

[9]  N. Nalini,et al.  Evidence of oxidative stress in the circulation of ovarian cancer patients. , 2004, Clinica chimica acta; international journal of clinical chemistry.

[10]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[11]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[12]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[13]  Hiroshi Mamitsuka,et al.  Selecting features in microarray classification using ROC curves , 2006, Pattern Recognit..

[14]  Jian Huang,et al.  Regularized binormal ROC method in disease classification using microarray data , 2005, BMC Bioinformatics.

[15]  C. Yiannoutsos,et al.  Assessment of diagnostic markers by goodness‐of‐fit tests , 2003, Statistics in medicine.

[16]  B. Jensen,et al.  High levels of serum HER-2/neu and YKL-40 independently reflect aggressiveness of metastatic breast cancer. , 2003, Clinical cancer research : an official journal of the American Association for Cancer Research.

[17]  Ralph B. D'Agostino,et al.  Goodness-of-Fit-Techniques , 2020 .

[18]  M. Schummer,et al.  Selecting Differentially Expressed Genes from Microarray Experiments , 2003, Biometrics.

[19]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[20]  Ian B. Jeffery,et al.  Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data , 2006, BMC Bioinformatics.

[21]  Michael F. Ochs,et al.  Matrix factorisation methods applied in microarray data analysis , 2010, Int. J. Data Min. Bioinform..

[22]  Chuhsing Kate Hsiao,et al.  Alternative Summary Indices for the Receiver Operating Characteristic Curve , 1996, Epidemiology.

[23]  Chris Lloyd,et al.  Regression Models for Convex ROC Curves , 2000, Biometrics.

[24]  D. Jeong,et al.  Selenoprotein W is a glutathione‐dependent antioxidant in vivo , 2002, FEBS letters.

[25]  L. Pan,et al.  Identification of platinum-resistance associated proteins through proteomic analysis of human ovarian cancer cells and their platinum-resistant sublines. , 2007, Journal of proteome research.

[26]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  O. Ogawa,et al.  Restoration of cyclin D2 has an inhibitory potential on the proliferation of LNCaP cells. , 2009, Biochemical and biophysical research communications.

[28]  A. Diamond,et al.  Role of glutathione peroxidase 1 in breast cancer: loss of heterozygosity and allelic differences in the response to selenium. , 2003, Cancer research.

[29]  M H Gail,et al.  A generalization of the one-sided two-sample Kolmogorov-Smirnov statistic for evaluating diagnostic tests. , 1976, Biometrics.

[30]  Ming Tan,et al.  ROC‐Based Utility Function Maximization for Feature Selection and Classification with Applications to High‐Dimensional Protease Data , 2008, Biometrics.

[31]  Elizabeth A. Repasky,et al.  VSGP/F-Spondin: A New Ovarian Cancer Marker , 2005, Tumor Biology.

[32]  A. Scorilas,et al.  Kallikreins as Markers of Disseminated Tumour Cells in Ovarian Cancer – A Pilot Study , 2006, Tumor Biology.

[33]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[34]  Lei Yu,et al.  Feature Cluster Selection for High-Throughput Data Analysis , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[35]  Dale H. Mugler,et al.  A gene selection method for classifying cancer samples using 1D discrete wavelet transform , 2009, Int. J. Comput. Biol. Drug Des..

[36]  X. Cao,et al.  Smad1 Interacts with Homeobox DNA-binding Proteins in Bone Morphogenetic Protein Signaling* , 1999, The Journal of Biological Chemistry.

[37]  George W. Irwin,et al.  Two-stage gene selection for support vector machine classification of microarray data , 2009, Int. J. Model. Identif. Control..

[38]  C. Metz,et al.  Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. , 1998, Statistics in medicine.

[39]  J. Johansen,et al.  Serum YKL-40: a new potential marker of prognosis and location of metastases of patients with recurrent breast cancer. , 1995, European journal of cancer.

[40]  M. Hung,et al.  Grb2 downregulation leads to Akt inactivation in heregulin-stimulated and ErbB2-overexpressing breast cancer cells , 2000, Oncogene.

[41]  V. Kosma,et al.  Association between manganese superoxide dismutase (MnSOD) gene polymorphism and breast cancer risk. , 2001, Carcinogenesis.

[42]  L. Rodrigues,et al.  The Role of Osteopontin in Tumor Progression and Metastasis in Breast Cancer , 2007, Cancer Epidemiology Biomarkers & Prevention.