Biological validation of differentially expressed genes in chronic lymphocytic leukemia identified by applying multiple statistical methods to oligonucleotide microarrays.

Oligonucleotide microarrays are a powerful tool for profiling the expression levels of thousands of genes. Different statistical methods for identifying differentially expressed genes can yield different results. To our knowledge, no experimental test has been performed to decide which method best identifies genes that are truly differentially expressed. We applied three statistical methods (dChip, t-test on log-transformed data, and Wilcoxon test) to identify differentially expressed genes in previously untreated patients with chronic lymphocytic leukemia (CLL). We used a training set of Affymetrix Hu133A microarray data from 11 patients with unmutated immunoglobulin (Ig) heavy chain variable region (VH) genes and 8 patients with mutated Ig VH genes. Differential expression was validated using semiquantitative real-time polymerase chain reaction assays and by validating models to predict the somatic mutation status of an independent test set of nine CLL samples. The methods identified 144 genes that were differentially expressed between cases of CLL with unmutated compared with mutated Ig VH genes. Eighty genes were identified by Wilcoxon test, 60 by t-test, and 65 by dChip, but only 11 were identified by all three methods. Greater agreement was found between the t-test and the Wilcoxon test. Differential expression was validated by semiquantitative real-time polymerase chain reaction assays for 83% of individual genes, regardless of the statistical method. However, the Wilcoxon test gave the most accurate predictions on new samples, and dChip, the least accurate. We found that all three methods were equally good for finding differentially expressed genes, but they found different genes. The genes selected by the nonparametric Wilcoxon test are the most robust for predicting the status of new cases. A comprehensive list of all differentially expressed genes can only be obtained by combining the results of multiple statistical tests.

[1]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[2]  T J Hamblin,et al.  Unmutated Ig V(H) genes are associated with a more aggressive form of chronic lymphocytic leukemia. , 1999, Blood.

[3]  Steven L. Allen,et al.  Ig V gene mutation status and CD38 expression as novel prognostic indicators in chronic lymphocytic leukemia. , 1999, Blood.

[4]  Mallika Singh,et al.  BSAP (Pax5)-Importin α1 (Rch1) Interaction Identifies a Nuclear Localization Sequence* , 2000, The Journal of Biological Chemistry.

[5]  L. P. Gustavson,et al.  Observations on the level of cyclic nucleotides in three population of human lymphocytes in culture. , 1980, Journal of cyclic nucleotide research.

[6]  David M. Rocke,et al.  Transformation and normalization of oligonucleotide microarray data , 2003, Bioinform..

[7]  Arthur Weiss,et al.  Expression of ZAP-70 is associated with increased B-cell receptor signaling in chronic lymphocytic leukemia. , 2002, Blood.

[8]  H. Döhner,et al.  Evidence for distinct pathomechanisms in B-cell chronic lymphocytic leukemia and mantle cell lymphoma by quantitative expression analysis of cell cycle and apoptosis-associated genes. , 2002, Blood.

[9]  D. Oscier,et al.  CD38 expression and immunoglobulin variable region mutations are independent prognostic variables in chronic lymphocytic leukemia, but CD38 expression may vary during the course of the disease. , 2002, Blood.

[10]  Cheng Li,et al.  DNA-Chip Analyzer (dChip) , 2003 .

[11]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Adrian Wiestner,et al.  ZAP-70 expression identifies a chronic lymphocytic leukemia subtype with unmutated immunoglobulin genes, inferior clinical outcome, and distinct gene expression profile. , 2003, Blood.

[14]  C. Stewart,et al.  Phenotypic heterogeneity of B cells in patients with chronic lymphocytic leukemia/small lymphocytic lymphoma. , 2003, American journal of clinical pathology.

[15]  J. Zhang,et al.  Identification and characterization of DPZF, a novel human BTB/POZ zinc finger protein sharing homology to BCL-6. , 2001, Biochemical and biophysical research communications.

[16]  R. Zaru,et al.  Cutting Edge: TCR Engagement and Triggering in the Absence of Large-Scale Molecular Segregation at the T Cell-APC Contact Site1 , 2002, The Journal of Immunology.

[17]  Kevin R Coombes,et al.  High expression of activation-induced cytidine deaminase (AID) and splice variants is a distinctive feature of poor-prognosis chronic lymphocytic leukemia. , 2003, Blood.

[18]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[19]  Kevin R. Coombes,et al.  Identifying Differentially Expressed Genes in cDNA Microarray Experiments , 2001, J. Comput. Biol..

[20]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[21]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[22]  R. Pagani,et al.  Synthesis of adenine and guanine nucleotides at the 'inosinic branch point' in lymphocytes of leukemia patients. , 1999, Biochimica et biophysica acta.

[23]  R. Salgia,et al.  Involvement of p130Cas and p105HEF1, a Novel Cas-like Docking Protein, in a Cytoskeleton-dependent Signaling Pathway Initiated by Ligation of Integrin or Antigen Receptor on Human B Cells* , 1997, The Journal of Biological Chemistry.

[24]  Christina A. Cuomo,et al.  Rch1, a protein that specifically interacts with the RAG-1 recombination-activating protein. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[26]  K. Coombes,et al.  A comparative analysis of data generated using two different target preparation methods for hybridization to high-density oligonucleotide microarrays , 2004, BMC Genomics.

[27]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..