Multi-category bioinformatics dataset classification using extreme learning machine

This paper presents recently introduced learning algorithm called Extreme Learning Machine (ELM) for Single-hidden Layer Feed-forward Neural-networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. The ELM avoids problems like local minima, improper learning rate and over fitting commonly faced by iterative learning methods and completes the training very fast. We have evaluated the multi-category classification performance of ELM on five different data sets related to bioinformatics namely, the Breast Cancer Wisconsin data set, the Pima Diabetes data set, the Heart-Statlog data set, the Hepatitis data set and the Hypothyroid data set. A detailed analysis of different activation functions with varying number of neurons is also carried out which concludes that Algebraic Sigmoid function outperforms all other activation functions on these data sets. The evaluation results indicate that ELM produces better classification accuracy with reduced training time and implementation complexity compared to earlier implemented models.

[1]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[2]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[3]  Harry Zhang,et al.  Full Bayesian network classifiers , 2006, ICML.

[4]  Siegfried J. Pöppl,et al.  The 'subsequent artificial neural network' (SANN) approach might bring more classificatory power to ANN-based DNA microarray analyses , 2004, Bioinform..

[5]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[6]  Chee Kheong Siew,et al.  Extreme learning machine: RBF network case , 2004, ICARCV 2004 8th Control, Automation, Robotics and Vision Conference, 2004..

[7]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[8]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[9]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[10]  Jae Won Lee,et al.  An extensive comparison of recent classification tools applied to microarray data , 2004, Comput. Stat. Data Anal..

[11]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[12]  P. Saratchandran,et al.  Multicategory Classification Using An Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  C. Siew,et al.  Extreme Learning Machine with Randomly Assigned RBF Kernels , 2005 .

[14]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  M. Ringnér,et al.  Analyzing array data using supervised methods. , 2002, Pharmacogenomics.