The effect of GeneChip gene definitions on the microarray study of cancers.

The Affymetrix GeneChip is a popular microarray platform for genome-wide expression profiling and has been widely used in functional genomics especially in the classification of cancers. Due to the updating of genome data, much of the genome information with which the chips were designed is out-of-date and it has been reported that many of the genes/transcripts on the chips differ from their original definition when mapping the probes to the new genome information. Dai et al. have reported that the updated definition can cause as much as 30-50% discrepancy in the genes selected as differentially expressed on a heart tissue expression profiling dataset. Understanding the nature of this difference is therefore very important for the utilization of the data. In this work, with a large cancer dataset as an example, we compared two major definitions and investigated their effects on classification, clustering, discovery of differentially expressed genes and gene-set-based analysis. Results show that the two definitions agree well on clustering and classification results but genes and gene sets discovered as differentially expressed or enriched can be very different. Discoveries based on the Affymetrix definition can cover most of those based on the new definition, but tend to have more false positives.

[1]  D. Lockhart,et al.  Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[3]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[4]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Kenneth H Buetow,et al.  Detecting false expression signals in high-density oligonucleotide arrays by an in silico approach. , 2005, Genomics.

[6]  L. Staudt,et al.  Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. , 2004, The New England journal of medicine.

[7]  M. S. San Francisco,et al.  Control of exuT activity for galacturonate transport by the negative regulator ExuR in Erwinia chrysanthemi EC16. , 2001, Molecular plant-microbe interactions : MPMI.

[8]  S. Dhanasekaran,et al.  Delineation of prognostic biomarkers in prostate cancer , 2001, Nature.

[9]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Torben F. Ørntoft,et al.  Identifying distinct classes of bladder carcinoma using microarrays , 2003, Nature Genetics.

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  Steen Knudsen,et al.  Alternative mapping of probes to genes for Affymetrix chips , 2004, BMC Bioinformatics.

[13]  S. Enkemann,et al.  A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array , 2005, Nucleic acids research.

[14]  E. Lander,et al.  A molecular signature of metastasis in primary solid tumors , 2003, Nature Genetics.

[15]  D. Botstein,et al.  Gene expression patterns in human liver cancers. , 2002, Molecular biology of the cell.

[16]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[17]  Xuegong Zhang,et al.  Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data , 2006, BMC Bioinformatics.

[18]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Zoltan Szallasi,et al.  Increased measurement accuracy for sequence-verified microarray probes. , 2004, Physiological genomics.

[20]  D. Botstein,et al.  Diversity of gene expression in adenocarcinoma of the lung , 2001, Proceedings of the National Academy of Sciences of the United States of America.