Scalability Analysis of ANN Training Algorithms with Feature Selection

The advent of high dimensionality problems has brought new challenges for machine learning researchers, who are now interested not only in the accuracy but also in the scalability of algorithms. In this context, machine learning can take advantage of feature selection methods to deal with large-scale databases. Feature selection is able to reduce the temporal and spatial complexity of learning, turning an impracticable algorithm into a practical one. In this work, the influence of feature selection on the scalability of four of the most well-known training algorithms for feedforward artificial neural networks (ANNs) is studied. Six different measures are considered to evaluate scalability, allowing to establish a final score to compare the algorithms. Results show that including a feature selection step, ANNs algorithms perform much better in terms of scalability.

[1]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[2]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[3]  Jorge J. Moré,et al.  The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .

[4]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[5]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[6]  Jason Weston,et al.  Large-Scale Learning with String Kernels , 2007 .

[7]  Ching Y. Suen,et al.  Speed and accuracy: large-scale machine learning algorithms and their applications , 2003 .

[8]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[9]  Oscar Fontenla-Romero,et al.  A distributed learning algorithm based on two-layer artificial neural networks and genetic algorithms , 2011, ESANN.

[10]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[11]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[12]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[13]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[14]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[15]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[16]  Verónica Bolón-Canedo,et al.  On the effectiveness of discretization on gene selection of microarray data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[17]  Verónica Bolón-Canedo,et al.  Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset , 2011, Expert Syst. Appl..

[18]  Huan Liu,et al.  Searching for Interacting Features , 2007, IJCAI.