Tree-based analysis of microarray data for classifying breast cancer.

DNA microarray data have provided us with the opportunity to assess the expression levels for thousands of genes simultaneously. One of the uses of this information is to classify cancer tumors. A noted challenge in using microarray information is analytical. Following the work of Zhang et al. (1), we further pursue the use of recursive partitioning in analyses of microarray data for cancer classification. Not only does the recursive partitioning technique create intuitive classification rules, but also it is most flexible as to the handling of a massive number of genes, missing expressions, and multi-class tissues. Using a published data set (2), we demonstrate that the recursive partitioning technique creates a more precise and simpler classification rule than other commonly used approaches. In particular, we introduce the concept of A-tree and propose a procedure to assess a large number of A-trees. One of the identified genes (ERBB2) is in the close region of BRCA1 (17q21.1) and has been shown by others to have altered expression levels in breast cancer. Nonetheless, our identified genes warrant further investigation as to whether they play a role in the etiology of breast cancer.

[1]  Burton H. Singer,et al.  Recursive partitioning in the health sciences , 1999 .

[2]  E. Lander Array of hope , 1999, Nature Genetics.

[3]  J. Hacia Resequencing and mutational analysis using oligonucleotide microarrays , 1999, Nature Genetics.

[4]  P. Goodfellow,et al.  DNA microarrays in drug discovery and development , 1999, Nature Genetics.

[5]  S. Wölfl,et al.  Molecular characterization of breast cancer cell lines by expression profiling , 2002, Journal of Cancer Research and Clinical Oncology.

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[8]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[9]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  E. Boerwinkle,et al.  Computational methods for gene expression-based tumor classification. , 2000, BioTechniques.

[12]  M. Xiong,et al.  Recursive partitioning for tumor classification with gene expression microarray data , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[15]  I. Mian,et al.  Analysis of molecular profile data using generative and discriminative methods. , 2000, Physiological genomics.

[16]  L. Neckers,et al.  Modulation of p53, ErbB1, ErbB2, and Raf-1 expression in lung cancer cells by depsipeptide FR901228. , 2002, Journal of the National Cancer Institute.

[17]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.