Cancer Outlier Analysis Based on Mixture Modeling of Gene Expression Data

Molecular heterogeneity of cancer, partially caused by various chromosomal aberrations or gene mutations, can yield substantial heterogeneity in gene expression profile in cancer samples. To detect cancer-related genes which are active only in a subset of cancer samples or cancer outliers, several methods have been proposed in the context of multiple testing. Such cancer outlier analyses will generally suffer from a serious lack of power, compared with the standard multiple testing setting where common activation of genes across all cancer samples is supposed. In this paper, we consider information sharing across genes and cancer samples, via a parametric normal mixture modeling of gene expression levels of cancer samples across genes after a standardization using the reference, normal sample data. A gene-based statistic for gene selection is developed on the basis of a posterior probability of cancer outlier for each cancer sample. Some efficiency improvement by using our method was demonstrated, even under settings with misspecified, heavy-tailed t-distributions. An application to a real dataset from hematologic malignancies is provided.

[1]  P. Deb Finite Mixture Models , 2008 .

[2]  H. Lian MOST: detecting cancer differential gene expression. , 2007, Biostatistics.

[3]  Geoffrey J. McLachlan,et al.  A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays , 2006, Bioinform..

[4]  Chi-Hong Tseng,et al.  Sample size calculation with dependence adjustment for FDR-control in microarray studies. , 2007, Statistics in medicine.

[6]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[7]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[8]  Sin-Ho Jung,et al.  Sample size for FDR-control in microarray data analysis , 2005, Bioinform..

[9]  Torsten Haferlach,et al.  Microarray-based classifiers and prognosis models identify subgroups with distinct clinical outcomes and high risk of AML transformation of myelodysplastic syndrome. , 2009, Blood.

[10]  H. Aburatani,et al.  Identification of the transforming EML4–ALK fusion gene in non-small-cell lung cancer , 2007, Nature.

[11]  阿鲁·M·辛莱岩,et al.  Recurrent Gene Fusions In Prostate Cancer , 2009 .

[12]  A. Chinnaiyan,et al.  Recurrent gene fusions in prostate cancer , 2008, Nature Reviews Cancer.

[13]  Baolin Wu,et al.  Cancer outlier differential gene expression detection. , 2007, Biostatistics.

[14]  R. Tibshirani,et al.  Outlier sums for differential gene expression analysis. , 2007, Biostatistics.

[15]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .