Neuro-logistic Models Based on Evolutionary Generalized Radial Basis Function for the Microarray Gene Expression Classification Problem

Gene expression detection is a key bioinformatic problem which has been tackled as a classification problem of microarray gene expression, obtained by the light reflection analysis of genomic material. A typical microarray dataset may contain thousands of genes but only a small number of patterns (often less than two hundred). When the dataset presents these kinds of characteristics, state-of-the-art classification models show a high lack of performance. A two-stage algorithm has been proposed to successfully address the problem of microarray classification. In the first stage, two filter algorithms identify salient expression genes from thousands of genes. In the second stage, the proposed methodology is performed using selected gene subsets as new input variables. The methodology proposed is composed of a combination of Logistic Regression (LR) and Evolutionary Generalized Radial Basis Function (EGRBF) neural networks which have shown to be highly accurate in previous research in the modeling of high-dimensional patterns. Finally, the results obtained are contrasted with nonparametric statistical tests and confirm good synergy between EGRBF and LR models.

[1]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[2]  Pedro Antonio Gutiérrez,et al.  Evolutionary q-Gaussian Radial Basis Functions for Improving Prediction Accuracy of Gene Classification Using Feature Selection , 2010, ICANN.

[3]  Pedro Antonio Gutiérrez,et al.  Classification by Evolutionary Generalized Radial Basis Functions , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[4]  Lihua Fu,et al.  Sparse RBF Networks with Multi-kernels , 2010, Neural Processing Letters.

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  Jesús S. Aguilar-Ruiz,et al.  Best Agglomerative Ranked Subset for Feature Selection , 2008, FSDM.

[7]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[8]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[10]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[11]  Damien Franois High-dimensional Data Analysis: From Optimal Metrics to Feature Selection , 2008 .

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  César Hervás-Martínez,et al.  Multilogistic regression by means of evolutionary product-unit neural networks , 2008, Neural Networks.

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[16]  Pedro Antonio Gutiérrez,et al.  MELM-GRBF: A modified version of the extreme learning machine for generalized radial basis function neural networks , 2011, Neurocomputing.

[17]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[18]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[19]  Narasimhan Sundararajan,et al.  Performance Evaluation of GAP-RBF Network in Channel Equalization , 2005, Neural Processing Letters.

[20]  Pedro Antonio Gutiérrez,et al.  Evolutionary q-Gaussian radial basis function neural networks for multiclassification , 2011, Neural Networks.

[21]  Pedro Antonio Gutiérrez,et al.  Classification by evolutionary generalised radial basis functions , 2010, Int. J. Hybrid Intell. Syst..

[22]  Krzysztof Bandurski,et al.  A Lamarckian Hybrid of Differential Evolution and Conjugate Gradients for Neural Network Training , 2010, Neural Processing Letters.

[23]  Min-Ling Zhang,et al.  Ml-rbf: RBF Neural Networks for Multi-Label Learning , 2009, Neural Processing Letters.

[24]  César Hervás-Martínez,et al.  Logistic regression using covariates obtained by product-unit neural network models , 2007, Pattern Recognit..

[25]  Zhi-Hua Zhou,et al.  Adapting RBF Neural Networks to Multi-Instance Learning , 2006, Neural Processing Letters.

[26]  Angel A. Juan,et al.  Special issue on Hybrid Fuzzy Models , 2010, Int. J. Hybrid Intell. Syst..

[27]  Eibe Frank,et al.  Logistic Model Trees , 2003, ECML.

[28]  Michael Stonebraker,et al.  The Morgan Kaufmann Series in Data Management Systems , 1999 .

[29]  Pedro Antonio Gutiérrez,et al.  A dynamic over-sampling procedure based on sensitivity for multi-class problems , 2011, Pattern Recognit..

[30]  Hilary Buxton,et al.  RBF Network Methods for Face Detection and Attentional Frames , 2004, Neural Processing Letters.

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[33]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[34]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[35]  Philip E. Gill,et al.  Practical optimization , 1981 .

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Pedro Antonio Gutiérrez,et al.  Evolutionary q-Gaussian Radial Basis Function Neural Network to determine the microbial growth/no growth interface of Staphylococcus aureus , 2011, Appl. Soft Comput..

[39]  Xinggao Liu,et al.  Melt index prediction by RBF neural network optimized with an MPSO-SA hybrid algorithm , 2011, Neurocomputing.