The Bimodality Index: A Criterion for Discovering and Ranking Bimodal Signatures from Cancer Gene Expression Profiling Data

Motivation Identifying genes with bimodal expression patterns from large-scale expression profiling data is an important analytical task. Model-based clustering is popular for this purpose. That technique commonly uses the Bayesian information criterion (BIC) for model selection. In practice, however, BIC appears to be overly sensitive and may lead to the identification of bimodally expressed genes that are unreliable or not clinically useful. We propose using a novel criterion, the bimodality index, not only to identify but also to rank meaningful and reliable bimodal patterns. The bimodality index can be computed using either a mixture model-based algorithm or Markov chain Monte Carlo techniques. Results We carried out simulation studies and applied the method to real data from a cancer gene expression profiling study. Our findings suggest that BIC behaves like a lax cutoff based on the bimodality index, and that the bimodality index provides an objective measure to identify and rank meaningful and reliable bimodal patterns from large-scale gene expression datasets. R code to compute the bimodality index is included in the ClassDiscovery package of the Object-Oriented Microarray and Proteomic Analysis (OOMPA) suite available at the web site http;//bioinformatics.mdanderson.org/Software/OOMPA.

[1]  Adam Ertel,et al.  Switch-like genes populate cell communication pathways and are enriched for extracellular proteins , 2008, BMC Genomics.

[2]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[3]  J. Stec,et al.  Gene expression profiles obtained from fine-needle aspirations of breast cancer reliably identify routine prognostic markers and reveal large-scale molecular differences between estrogen-negative and estrogen-positive tumors. , 2003, Clinical cancer research : an official journal of the American Association for Cancer Research.

[4]  Kevin Coombes,et al.  RefSeq Refinements of UniGene-Based Gene Matching Improve the Correlation of Expression Measurements Between Two Microarray Platforms , 2006, Applied bioinformatics.

[5]  E. Wit Design and Analysis of DNA Microarray Investigations , 2004, Human Genomics.

[6]  Lajos Pusztai,et al.  Determination of oestrogen-receptor status and ERBB2 status of breast carcinoma: a gene-expression profiling study. , 2007, The Lancet. Oncology.

[7]  W. Pan,et al.  Model-based cluster analysis of microarray gene-expression data , 2002, Genome Biology.

[8]  G. Gibson,et al.  Mixture modeling of transcript abundance classes in natural populations , 2007, Genome Biology.

[9]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[10]  Andrew E. Teschendorff,et al.  PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer , 2006, Bioinform..

[11]  P. Sismondi,et al.  Creatine kinase BB isoenzyme levels in tumour cytosols and survival of breast cancer patients. , 1996, British Journal of Cancer.

[12]  I. Ellis,et al.  An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer , 2007, Genome Biology.

[13]  J. Ross,et al.  Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[14]  Jie Cao,et al.  Up-regulation of bone marrow stromal protein 2 (BST2) in breast cancer with bone metastasis , 2009, BMC Cancer.

[15]  J. Doran,et al.  CREATINE KINASE‐BB ISOENZYME AS A TUMOR MARKER IN BREAST CANCER, LUNG CANCER, AND LYMPHOMA a , 1983, Annals of the New York Academy of Sciences.

[16]  G. Parmigiani,et al.  A statistical framework for expression‐based molecular classification in cancer , 2002 .

[17]  A. Reverter,et al.  A mixture model-based cluster analysis of DNA microarray gene expression data on Brahman and Brahman composite steers fed high-, medium-, and low-quality diets. , 2003, Journal of animal science.

[18]  P. Müller,et al.  A Bayesian mixture model for differential gene expression , 2005 .

[19]  Hans-Dieter Pohlenz,et al.  Distinct gene expression patterns in a tamoxifen-sensitive human mammary carcinoma xenograft and its tamoxifen-resistant subline MaCa 3366/TAM. , 2005, Molecular cancer therapeutics.

[20]  N. Mendell,et al.  Simulated percentage points for the null distribution of the likelihood ratio test for a mixture of two normals. , 1988, Biometrics.

[21]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[22]  Kui Wang,et al.  A Mixture model with random-effects components for clustering correlated gene-expression profiles , 2006, Bioinform..

[23]  G. Hortobagyi,et al.  HER2 expression and efficacy of preoperative paclitaxel/FAC chemotherapy in breast cancer , 2008, Breast Cancer Research and Treatment.