Quantization and similarity measure selection for discrimination of lymphoma subtypes under k-nearest neighbor classification

Molecular classification of tumors holds great potential for cancer research, diagnosis, and treatment. In this study, we apply a novel classification technique to cDNA microarray data for discriminating between three subtypes of malignant lymphoma: CD5+ diffuse large B-cell lymphoma, CD5- diffuse large B-cell lymphoma, and mantle cell lymphoma. The proposed technique combines the k-Nearest Neighbor (k-NN) algorithm with optimized data quantization. The feature genes on which the classification is based are selected by ranking them according to their separability criteria computed by taking into account between-class and within-class scatter. The classification errors, estimated using cross-validation, are significantly lower than those produced by classical variants of the k-NN algorithm. Multidimensional scaling and hierarchical clustering dendrograms are used to visualize the separation of the three subtypes of lymphoma.

[1]  D. Knowles,et al.  De novo CD5-positive and Richter's syndrome-associated diffuse large B cell lymphomas are genotypically distinct. , 1995, The American journal of pathology.

[2]  N. Nakamura,et al.  Analysis of the immunoglobulin heavy chain gene variable region of CD5-positive diffuse large B-cell lymphoma. , 1999, Laboratory investigation; a journal of technical methods and pathology.

[3]  A. Arkin,et al.  It's a noisy business! Genetic regulation at the nanomolar scale. , 1999, Trends in genetics : TIG.

[4]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[5]  D Cogdell,et al.  Sequence verification as quality-control step for production of cDNA microarrays. , 2001, BioTechniques.

[6]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .

[7]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[10]  Kevin R. Coombes,et al.  Identifying Differentially Expressed Genes in cDNA Microarray Experiments , 2001, J. Comput. Biol..

[11]  S Shirakawa,et al.  Frequent expression of shared idiotypes in mantle cell lymphoma and extranodal small lymphocytic/non-mantle cell diffuse small cleaved lymphoma. , 1995, Leukemia.

[12]  Ilya Shmulevich,et al.  Binary analysis and optimization-based normalization of gene expression data , 2002, Bioinform..

[13]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[14]  S. Schiffman Introduction to Multidimensional Scaling , 1981 .

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  H. Shiku,et al.  De novo CD5+ diffuse large B-cell lymphomas express VH genes with somatic mutation. , 1998, Blood.

[17]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[18]  I. Shmulevich,et al.  Computational and Statistical Approaches to Genomics , 2007, Springer US.

[19]  Jaakko Astola,et al.  Quantization and distance function selecton for discrimination of tumors using gene expression data , 2002, SPIE BiOS.

[20]  Edward R. Dougherty,et al.  Small Sample Issues for Microarray-Based Classification , 2001, Comparative and functional genomics.

[21]  M. Seto,et al.  Clinicopathologic study of PRAD1/cyclin D1 overexpressing lymphoma with special reference to mantle cell lymphoma. A distinct molecular pathologic entity. , 1996, The American journal of surgical pathology.

[22]  M. Seto,et al.  Molecular and immunological dissection of diffuse large B cell lymphoma: CD5+, and CD5− with CD10+ groups may constitute clinically relevant subtypes , 1999, Leukemia.

[23]  久米正晃 Somatic Hypermutations in the V[H] Segment of Immunoglobulin Genes of CD5-positive Diffuse Large B-Cell Lymphomas , 1998 .

[24]  Takashi Akasaka,et al.  De novo CD5+ diffuse large B-cell lymphoma: a clinicopathologic study of 109 patients. , 2002, Blood.

[25]  H Stein,et al.  A revised European-American classification of lymphoid neoplasms: a proposal from the International Lymphoma Study Group. , 1994, Blood.

[26]  Danh V. Nguyen,et al.  Multi-class cancer classification via partial least squares with gene expression profiles , 2002, Bioinform..

[27]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[28]  E. Berg,et al.  World Health Organization Classification of Tumours , 2002 .

[29]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[30]  Elaine S. Jaffe,et al.  A revised European-American classification of lymphoid neoplasms: a proposal from the International Lymphoma Study Group. , 1994, Blood.

[31]  Jaakko Astola,et al.  A measure of overall statistical dependence based on the entropy concept , 1983 .

[32]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[33]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[34]  Gary A. Churchill,et al.  Sources of Variation in Microarray Experiments , 2003 .