Computational Analysis of Leukemia Microarray Expression Data Using the GA/KNN Method

We recently developed a multivariate method that selects a subset of discriminative genes for sample classification based on gene expression data. The method combines a search tool, a genetic algorithm (GA), and a non­ parametric pattern recognition method, based on the k-nearest nearest neighbors (KNN) approach. We begin by selecting many subsets of genes that can discriminate among classes of samples using a training set. Subsequently, the genes are ranked according to the frequency of gene selection. The top­ ranked genes (e.g. 50) are then used to classify test set samples. For a widely­ available set of leukemia data, the top 50 genes identified by the GAlKNN method not only correctly classified 33 of the 34 test set samples, but also discovered the two distinct clinical SUbtypes within ALL without applying prior knowledge. The method has been successfully applied to several expression data sets. It may be used to identify a subset of informative genes (biomarkers) for sample classification for a variety of profiling studies including tumors.

[1]  P. J. Worsfold,et al.  Chemometrics: A Textbook (Data Handling in Science and Technology, Vol. 2) , 1989 .

[2]  M. Crumpton,et al.  The CD2 antigen associates with the T-cell antigen receptor CD3 antigen complex on the surface of human T lymphocytes , 1989, Nature.

[3]  M. Reth,et al.  Molecular components of the B-cell antigen receptor complex of the IgM class , 1990, Nature.

[4]  M. Isobe,et al.  Chromosome walking on the TCL1 locus involved in T-cell neoplasia. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[5]  R. Scheuermann,et al.  CD19 antigen in leukemia and lymphoma diagnosis and immunotherapy. , 1995, Leukemia & lymphoma.

[6]  A. Rolink,et al.  B-cell-specif ic coactivator OBF-1/OCA-B/Bob1 required for immune response and germinal centre formation , 1996, Nature.

[7]  S. Nakazawa,et al.  Characterization of leukemic cells in CD2/CD19 double positive acute lymphoblastic leukemia. , 1998, International journal of hematology.

[8]  C. Alvarado,et al.  Prevalence of myeloperoxidase gene expression in infant acute lymphocytic leukemia. , 1998, American journal of clinical pathology.

[9]  J. Crawley,et al.  Selectively enhanced contextual fear conditioning in mice lacking the transcriptional regulator CCAAT/enhancer binding protein delta. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  P. Matthias Lymphoid-specific transcription mediated by the conserved octamer site: who is doing what? , 1998, Seminars in immunology.

[12]  H. Dintzis,et al.  Malignant transformation of early lymphoid progenitors in mice expressing an activated Blk tyrosine kinase. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[16]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[17]  H. Müller-Hermelink,et al.  Up-regulation of BOB.1/OBF.1 expression in normal germinal center B cells and germinal center-derived lymphomas. , 2000, The American journal of pathology.

[18]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[19]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[20]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[21]  A. A. Thompson,et al.  Aberrant B cell receptor signaling from B29 (Igbeta, CD79b) gene mutations of chronic lymphocytic leukemia B cells. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Eric R. Ziegel,et al.  Handbook of Chemometrics and Qualimetrics, Part B , 2000, Technometrics.

[23]  G. Muir,et al.  Identification of potential diagnostic markers of prostate cancer and prostatic intraepithelial neoplasia using cDNA microarray , 2001, British Journal of Cancer.

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.