Genomic outlier detection in high-throughput data analysis.

In the analysis of high-throughput data, a very common goal is the detection of genes or of differential expression between two groups or classes. A recent finding from the scientific literature in prostate cancer demonstrates that by searching for a different pattern of differential expression, new candidate oncogenes might be found. In this chapter, we discuss the statistical problem, termed oncogene outlier detection, and discuss a variety of proposals to this problem. A statistical model in the multiclass situation is described; links with multiple testing concepts are established. Some new nonparametric procedures are described and compared to existing methods using simulation studies.

[1]  Alexander Gordon,et al.  Control of the mean number of false discoveries, Bonferroni and stability of multiple testing , 2007, 0709.0366.

[2]  Giovanni Parmigiani,et al.  Searching for differentially expressed gene combinations , 2005, Genome Biology.

[3]  Baolin Wu,et al.  Cancer outlier differential gene expression detection. , 2007, Biostatistics.

[4]  Yudi Pawitan,et al.  Multidimensional local false discovery rate for microarray studies , 2006, Bioinform..

[5]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[6]  R. Tibshirani,et al.  Outlier sums for differential gene expression analysis. , 2007, Biostatistics.

[7]  J. Tchinda,et al.  Recurrent Fusion of TMPRSS2 and ETS Transcription Factor Genes in Prostate Cancer , 2005, Science.

[8]  Andrei Yakovlev,et al.  The -Version of the Cramér-von Mises Test for Two-Sample Comparisons in Microarray Data Analysis , 2006, EURASIP journal on bioinformatics & systems biology.

[9]  Y. Benjamini,et al.  Screening for Partial Conjunction Hypotheses , 2008, Biometrics.

[10]  H. Lian MOST: detecting cancer differential gene expression. , 2007, Biostatistics.

[11]  Lev Klebanov,et al.  Multivariate search for differentially expressed gene combinations , 2004, BMC Bioinformatics.

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  S. Dudoit,et al.  Resampling-based multiple testing for microarray data analysis , 2003 .

[15]  L. Wasserman,et al.  A stochastic process approach to false discovery control , 2004, math/0406519.

[16]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[17]  Zhiyi Chi,et al.  False discovery rate control with multivariate p-values , 2007, 0706.0498.

[18]  Debashis Ghosh,et al.  COPA - cancer outlier profile analysis , 2006, Bioinform..

[19]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[20]  Fang Liu,et al.  Multi-group cancer outlier differential gene expression detection , 2007, Comput. Biol. Chem..

[21]  Xihong Lin,et al.  Semiparametric Modeling of Longitudinal Measurements and Time‐to‐Event Data–A Two‐Stage Regression Calibration Approach , 2008, Biometrics.

[22]  D. Ghosh,et al.  Genomic outlier profile analysis: mixture models, null hypotheses, and nonparametric estimation. , 2008, Biostatistics.

[23]  Michael J. Becich,et al.  Tests for finding complex patterns of differential expression in cancers: towards individualized medicine , 2004, BMC Bioinformatics.

[24]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[25]  J. Weinstein,et al.  Biomarkers in Cancer Staging, Prognosis and Treatment Selection , 2005, Nature Reviews Cancer.

[26]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[27]  J. Shaffer Multiple Hypothesis Testing , 1995 .