Toward the scalability of neural networks through feature selection

In the past few years, the bottleneck for machine learning developers is not longer the limited data available but the algorithms inability to use all the data in the available time. For this reason, researches are now interested not only in the accuracy but also in the scalability of the machine learning algorithms. To deal with large-scale databases, feature selection can be helpful to reduce their dimensionality, turning an impracticable algorithm into a practical one. In this research, the influence of several feature selection methods on the scalability of four of the most well-known training algorithms for feedforward artificial neural networks (ANNs) will be analyzed over both classification and regression tasks. The results demonstrate that feature selection is an effective tool to improve scalability.

[1]  Salvatore J. Stolfo,et al.  Adaptive Intrusion Detection: A Data Mining Approach , 2000, Artificial Intelligence Review.

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[4]  Verónica Bolón-Canedo,et al.  Scalability Analysis of ANN Training Algorithms with Feature Selection , 2011, CAEPIA.

[5]  Verónica Bolón-Canedo,et al.  Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset , 2011, Expert Syst. Appl..

[6]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[7]  Erik L. Johnson,et al.  Collective Data Mining From Distributed , Vertically PartitionedFeature , 1998 .

[8]  David B. Skillicorn,et al.  Distributed prediction from vertically partitioned data , 2008, J. Parallel Distributed Comput..

[9]  Salvatore J. Stolfo,et al.  Toward parallel and distributed learning by meta-learning , 1993 .

[10]  Tuomas Eerola,et al.  Generalizability and Simplicity as Criteria in Feature Selection: Application to Mood Classification in Music , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Oscar Fontenla-Romero,et al.  An Incremental Learning Method for Neural Networks Based on Sensitivity Analysis , 2009, CAEPIA.

[12]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[13]  M. Narasimha Murty,et al.  Scalable, Distributed and Dynamic Mining of Association Rules , 2000, HiPC.

[14]  David B. Skillicorn,et al.  Building predictors from vertically distributed data , 2004, CASCON.

[15]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[16]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[17]  Evgeniy Gabrilovich,et al.  Concept-Based Feature Generation and Selection for Information Retrieval , 2008, AAAI.

[18]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[19]  Marie-Francine Moens,et al.  Highly discriminative statistical features for email classification , 2012, Knowledge and Information Systems.

[20]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[21]  Carla E. Brodley,et al.  Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[23]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[24]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[25]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[26]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[27]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[28]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[29]  Oscar Fontenla-Romero,et al.  A distributed learning algorithm based on two-layer artificial neural networks and genetic algorithms , 2011, ESANN.

[30]  J. J. Moré,et al.  Levenberg--Marquardt algorithm: implementation and theory , 1977 .

[31]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[32]  Robi Polikar,et al.  An Ensemble-Based Incremental Learning Approach to Data Fusion , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[33]  Huan Liu,et al.  Searching for Interacting Features , 2007, IJCAI.

[34]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[35]  Grigorios Tsoumakas,et al.  Distributed Data Mining of Large Classifier Ensembles , 2002 .

[36]  Osamu Watanabe,et al.  Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms , 1999, Discovery Science.