Cancer classification using Rotation Forest

We address the microarray dataset based cancer classification using a newly proposed multiple classifier system (MCS), referred to as Rotation Forest. To the best of our knowledge, it is the first time that Rotation Forest has been applied to the microarray dataset classification. In the framework of Rotation Forest, a linear transformation method is required to project data into new feature space for each classifier, and then the base classifiers are trained in different new spaces so as to enhance both the accuracies of base classifiers and the diversity in the ensemble system. Principal component analysis (PCA), non-parametric discriminant analysis (NDA) and random projections (RP) were applied to feature transformation in the original Rotation Forest. In this paper, we use independent component analysis (ICA) as a new transformation method since it can better describe the property of microarray data. The breast cancer dataset and prostate dataset are deployed to validate the efficiency of Rotation Forest. In all the experiments, it can be found that Rotation Forest outperforms other MCSs, such as Bagging and Boosting. In addition, the experimental results also revealed that ICA can further improve the performance of Rotation Forest compared with the original transformation methods.

[1]  Yonghong Peng,et al.  A novel ensemble machine learning for robust microarray data classification , 2006, Comput. Biol. Medicine.

[2]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[5]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[6]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[8]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[9]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[10]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[11]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  A. Danchin,et al.  Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis , 2005, European Journal of Human Genetics.

[15]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[16]  Marco Zaffalon,et al.  Robust Feature Selection by Mutual Information Distributions , 2002, UAI.

[17]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[18]  Juan José Rodríguez Diez,et al.  An Experimental Study on Rotation Forest Ensembles , 2007, MCS.

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[20]  Ian Witten,et al.  Data Mining , 2000 .

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[22]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[23]  Marian Stewart Bartlett,et al.  Face recognition by independent component analysis , 2002, IEEE Trans. Neural Networks.

[24]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.