Marker Identification and Classification of Cancer Types Using Gene Expression Data and SIMCA
暂无分享,去创建一个
OBJECTIVES
High-throughput technologies are radically boosting the understanding of living systems, thus creating enormous opportunities to elucidate the biological processes of cells in different physiological states. In particular, the application of DNA micro-arrays to monitor expression profiles from tumor cells is improving cancer analysis to levels that classical methods have been unable to reach. However, molecular diagnostics based on expression profiling requires addressing computational issues as the overwhelming number of variables and the complex, multi-class nature of tumor samples. Thus, the objective of the present research has been the development of a computational procedure for feature extraction and classification of gene expression data.
METHODS
The Soft Independent Modeling of Class Analogy (SIMCA) approach has been implemented in a data mining scheme, which allows the identification of those genes that are most likely to confer robust and accurate classification of samples from multiple tumor types.
RESULTS
The proposed method has been tested on two different microarray data sets, namely Golub's analysis of acute human leukemia and the small round blue cell tumors study presented by Khan et al.. The identified features represent a rational and dimensionally reduced base for understanding the biology of diseases, defining targets of therapeutic intervention, and developing diagnostic tools for classification of pathological states.
CONCLUSIONS
The analysis of the SIMCA model residuals allows the identification of specific phenotype markers. At the same time, the class analogy approach provides the assignment to multiple classes, such as different pathological conditions or tissue samples, for previously unseen instances.
[1] S. Wold,et al. SIMCA: A Method for Analyzing Chemical Data in Terms of Similarity and Analogy , 1977 .
[2] J. Mesirov,et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.
[3] M. Ringnér,et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.
[4] I. T. Jolliffe,et al. Generalizations and Adaptations of Principal Component Analysis , 1986 .