Optimal Approach for Classification of Acute Leukemia Subtypes Based on Gene Expression Data

The classification of cancer subtypes, which is critical for successful treatment, has been studied extensively with the use of gene expression profiles from oligonucleotide chips or cDNA microarrays. Various pattern recognition methods have been successfully applied to gene expression data. However, these methods are not optimal, rather they are high‐performance classifiers that emphasize only classification accuracy. In this paper, we propose an approach for the construction of the optimal linear classifier using gene expression data. Two linear classification methods, linear discriminant analysis (LDA) and discriminant partial least‐squares (DPLS), are applied to distinguish acute leukemia subtypes. These methods are shown to give satisfactory accuracy. Moreover, we determined optimally the number of genes participating in the classification (a remarkably small number compared to previous results) on the basis of the statistical significance test. Thus, the proposed method constructs the optimal classifier that is composed of a small size predictor and provides high accuracy.

[1]  Leo H. Chiang,et al.  Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis , 2000 .

[2]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[3]  X. Bustelo,et al.  The expression of prothymosin alpha gene in T lymphocytes and leukemic lymphoid cells is tied to lymphocyte proliferation. , 1989, The Journal of biological chemistry.

[4]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.

[5]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.

[6]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Y. Honma,et al.  Plasma levels of the differentiation inhibitory factor nm23-H1 protein and their clinical implications in acute myelogenous leukemia. , 2000, Blood.

[8]  G. Sherlock Analysis of large-scale gene expression data. , 2000, Current opinion in immunology.

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  F. Cotter,et al.  Monosomy 7 and 7q--associated with myeloid malignancy. , 1997, Blood reviews.

[11]  Walter L. Ruzzo,et al.  Bayesian Classification of DNA Array Expression Data , 2000 .

[12]  E. K. Kemsley,et al.  THE USE AND MISUSE OF CHEMOMETRICS FOR TREATING CLASSIFICATION PROBLEMS , 1997 .

[13]  S. Wold,et al.  Orthogonal signal correction of near-infrared spectra , 1998 .

[14]  Hiroyuki Toh,et al.  Statistical estimation of cluster boundaries in gene expression profile data , 2001, Bioinform..

[15]  C. Bloomfield,et al.  Partial deletion of the long arm of chromosome 16 and bone marrow eosinophilia in acute nonlymphocytic leukemia: a new association. , 1983, Blood.

[16]  E Mjolsness,et al.  Machine learning for science: state of the art and future prospects. , 2001, Science.

[17]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[18]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[19]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Nitin J. Karandikar,et al.  Transient myeloproliferative disorder and acute myeloid leukemia in Down syndrome. An immunophenotypic analysis. , 2001, American journal of clinical pathology.

[21]  Hiroyuki Toh,et al.  Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling , 2002, Bioinform..

[22]  A. Brazma,et al.  Gene expression data analysis. , 2001, FEBS letters.

[23]  Y. Honma,et al.  Differentiation inhibitory factor Nm23 as a prognostic factor for acute myeloid leukemia. , 1998, Leukemia & lymphoma.