An integrated system for class prediction using gene expression profiling

Gene expression profiles have been successfully applied to class prediction. Due to a large number of genes (features) and a small number of samples in gene expression data, feature selection is essential when performing the prediction task. Many methods have been proposed to select features in microarray data analysis, but there is no unique method which performs uniformly well for all the learning algorithms. It is then practical to find a feature selection method and a learning algorithm that give superior performance. In this paper, we present an integrated scheme to perform the task of class prediction based on gene expression profiles. The scheme incorporates a simple novel feature selection procedure into naive Bayes models. Each selected gene has a high score of discriminatory power determined by the Brown-Forsythe test statistics. Any pair of selected genes have a low correlation. This facilitates the use of the conditional independence among genes assumed by the naive Bayes models. To demonstrate the effectiveness, the proposed scheme was applied to three commonly used expression data sets COLON, OVARIAN, and LEUKEMIA. The results show that the numbers of misclassified samples are 0, and 4, respectively.

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[5]  D. Lockhart,et al.  Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Eric P. Xing Feature Selection in Microarray Analysis , 2003 .

[7]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[8]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[9]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[10]  A. W. Kemp,et al.  Kendall's Advanced Theory of Statistics. , 1994 .

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  John Neter,et al.  Applied Linear Statistical Models , 1974 .

[13]  Jaques Reifman,et al.  Gene selection for multiclass prediction of microarray data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[14]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .