Identification of combination gene sets for glioma classification.

One goal for the gene expression profiling of cancer tissues is to identify signature genes that robustly distinguish different types or grades of tumors. Such signature genes would ideally provide a molecular basis for classification and also yield insight into the molecular events underlying different cancer phenotypes. This study applies a recently developed algorithm to identify not only single classifier genes but also gene sets (combinations) for use as glioma classifiers. Classifier genes identified by this algorithm are shown to be strong features by conservatively and collectively considering the misclassification errors of the feature sets. Applying this approach to a test set of 25 patients, we have identified the best single genes and two- to three-gene combinations for distinguishing four types of glioma: (a) oligodendroglioma; (b) anaplastic oligodendroglioma; (c) anaplastic astrocytoma; and (d) glioblastoma multiforme. Some of the identified genes, such as insulin-like growth factor-binding protein 2, have been confirmed to be associated with one of the tumor types. Using combinations of genes, the classification error rate can be significantly lowered. In many instances, neither of the individual genes of a two-gene set performs well as an accurate classifier, but the combination of the two genes forms a robust classifier with a small error rate. Two-gene and three-gene combinations thus provide robust classifiers possessing the potential to translate expression microarray results into diagnostic histopathological assays for clinical utilization.

[1]  P. Kelly,et al.  Grading of astrocytomas: A simple and reproducible method , 1988, Cancer.

[2]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[3]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4]  W. Yung,et al.  Reactivation of insulin-like growth factor binding protein 2 expression in glioblastoma multiforme: a revelation by parallel gene expression profiling. , 1999, Cancer research.

[5]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[6]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[7]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[8]  Benjamin Ray Seyfarth,et al.  How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters , 2000, Scalable Comput. Pract. Exp..

[9]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[10]  S. Batalov,et al.  A Comparison of the Celera and Ensembl Predicted Gene Sets Reveals Little Overlap in Novel Genes , 2001, Cell.

[11]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[12]  Edward R. Dougherty,et al.  Small Sample Issues for Microarray-Based Classification , 2001, Comparative and functional genomics.

[13]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[14]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[15]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[16]  Wei Zhang,et al.  Tissue Microarrays: Applications in Neuropathology Research, Diagnosis, and Education , 2002, Brain pathology.

[17]  W. K. Alfred Yung PATHOLOGY AND GENETICS OF TUMOURS OF THE NERVOUS SYSTEM , 2002 .