Comparative study of multivariate classification methods using microarray gene expression data for BRCA1/BRCA2 cancer tumors

High dimensionality is one the major problem in the classification of microarray gene expression data. Most of the classifiers performed well for the data having same number of features as the number of samples. But gene expression data have very few samples as compare to the number of genes or features. We use class prediction (CP) with compound covariant predictor (CCP), diagonal linear discriminant analysis (DLDA), k-nearest neighbor (NN), nearest centroid (NC) and support vector machine (SVM) to create multivariate predictor to determine the class of a given data sample. In this paper, CP has been used to classify the tumor groups from the microarrays dataset taken from breast cancer patients. The paper presents comparative results to determine the accuracy of a cancer gene classification based on six multivariate classifiers. Our results have shown that CCP has performed best with an accuracy of 100%, 85% and 86% among three tumor groups. Accurate analysis and classification of gene expression profiles could lead to more reliable tumor classification, better prognostic prediction and selection of more appropriate treatments.

[1]  A. Schuster,et al.  Tumor classification by gene expression profiling: comparison and validation of five clustering methods , 2001, SIGB.

[2]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[3]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[4]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[5]  Steven E. Bayer,et al.  A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. , 1994, Science.

[6]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[7]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[8]  Bernhard Schölkopf,et al.  Bounds on Error Expectation for SVM , 2000 .

[9]  Rainer Fuchs,et al.  Bayesian Estimation of Fold-Changes in the Analysis of Gene Expression: The PFOLD Algorithm , 2001, J. Comput. Biol..

[10]  M. King,et al.  BRCA1 transcriptionally regulates genes involved in breast tumorigenesis , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Kevin Dobbin,et al.  Comparison of microarray designs for class comparison and class discovery , 2002, Bioinform..

[12]  Richard M. Simon,et al.  A Paradigm for Class Prediction Using Gene Expression Profiles , 2003, J. Comput. Biol..

[13]  C. Cooper,et al.  Applications of microarray technology in breast cancer research , 2001, Breast Cancer Research.

[14]  Andreas Rytz,et al.  The limit fold change model: A practical approach for selecting differentially expressed genes from microarray data , 2002, BMC Bioinformatics.

[15]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[16]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[17]  O. Chapelle,et al.  Bounds on error expectation for SVM , 2000 .

[18]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[19]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[20]  Richard Simon,et al.  A random variance model for detection of differential gene expression in small microarray experiments , 2003, Bioinform..