Multicategory cancer classification from gene expression data by multiclass NPPC ensemble

The discovery of DNA microarray technologies have given immense opportunity to make gene expression profiles for different cancer types. Besides binary classification such as normal versus tumor samples the discrimination of multiple tumor types is also important. In this work, we have first extended the recently developed binary nonparallel plane proximal classifier (NPPC) to multiclass NPPC by decomposition techniques. The multiclass NPPC is then used in a computer aided diagnosis framework to classify multicategory cancer from gene expression data by selecting very few genes by using mutual information criterion. The idea of binary NPPC ensemble is extended to form multiclass NPPC ensemble. Besides usual majority voting method, we have introduced minimum average proximity based decision combiner for multiclass NPPC ensemble. The effectiveness of the proposed method are demonstrated on four benchmark microarray data sets and compared with support vector machine (SVM) classifier in a similar framework.

[1]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Anirban Mukherjee,et al.  Newton's method for nonparallel plane proximal classifier with unity norm hyperplanes , 2010, Signal Process..

[3]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[4]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[5]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[6]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[7]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[11]  Anirban Mukherjee,et al.  Cancer Classification from Gene Expression Data by NPPC Ensemble , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.