Hybrid Feature Selection Method Based on Neural Networks and Cross-Validation for Liver Cancer With Microarray

This paper proposes a method that extracts a feature set for accurate disease diagnosis from a feature (aptamer) array. Our method uses an artificial intelligence of the neural network and 10-fold cross-validations and is verified by the p-value of the aptamer array response to specimens of 80 liver cancer patients and 310 healthy people. The proposed method is compared with the one-way ANOVA method in terms of accuracy, the number of features, and computing time to determine the feature set required to achieve the same accuracy. An increase in the number of features dramatically improves the diagnosis accuracy of the two methods for 2–10 features. The accuracies with 10 features are 93.5% and 87.5%, and the increases in accuracy per additional feature are 3.39% and 2.65% for our method and the one-way ANOVA, respectively. For the same accuracy, our method needs only 1/2–1/3 number of features of the ANOVA. An interesting statistical characteristic of cross-validation is that diagnostic accuracy saturates after 10 000 cross-validations.

[1]  P. Saratchandran,et al.  Multicategory Classification Using An Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[3]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[4]  Ken Chen,et al.  Effect of multi-hidden-layer structure on performance of BP neural network: Probe , 2012, ICNC.

[5]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[6]  Byoung-Tak Zhang,et al.  AptaCDSS-E: A classifier ensemble-based clinical decision support system for cardiovascular disease level prediction , 2008, Expert Syst. Appl..

[7]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[8]  Kazuyuki Murase,et al.  A new wrapper feature selection approach using neural network , 2010, Neurocomputing.

[9]  Viv Bewick,et al.  Statistics review 9: One-way analysis of variance , 2004, Critical care.

[10]  P. Pavlidis Using ANOVA for gene selection from microarray studies of the nervous system. , 2003, Methods.

[11]  Kay Chen Tan,et al.  Estimating the Number of Hidden Neurons in a Feedforward Network Using the Singular Value Decomposition , 2006, IEEE Trans. Neural Networks.

[12]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[13]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[14]  S. Tope,et al.  Aptamers as therapeutics , 2013 .

[15]  Anastasios N. Venetsanopoulos,et al.  Artificial neural networks - learning algorithms, performance evaluation, and applications , 1992, The Kluwer international series in engineering and computer science.

[16]  I. Barany,et al.  Central limit theorems for Gaussian polytopes , 2006 .

[17]  Mohamed S. Kamel,et al.  On the optimal number of hidden nodes in a neural network , 1998, Conference Proceedings. IEEE Canadian Conference on Electrical and Computer Engineering (Cat. No.98TH8341).

[18]  Guido Bugmann,et al.  NEURAL NETWORK DESIGN FOR ENGINEERING APPLICATIONS , 2001 .

[19]  William G. Baxt,et al.  Use of an Artificial Neural Network for Data Analysis in Clinical Decision-Making: The Diagnosis of Acute Coronary Occlusion , 1990, Neural Computation.

[20]  Liping Wang,et al.  Hybrid feature selection method for gene expression analysis , 2014 .

[21]  B. Sullenger,et al.  Emerging clinical applications of RNA , 2002, Nature.

[22]  Jacques de Villiers,et al.  Backpropagation neural nets with one and two hidden layers , 1993, IEEE Trans. Neural Networks.

[23]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.