Instance Selection Optimization for Neural Network Training

Performing instance selection prior to the classifier training is always beneficial in terms of computational complexity reduction of the classifier training and sometimes also beneficial in terms of improving prediction accuracy. Removing the noisy instances improves the prediction accuracy and removing redundant and irrelevant instances does not negatively effect it. However, in practice the instance selection methods usually also remove some instances, which should not be removed from the training dataset, what results in decreasing the prediction accuracy. We discuss two methods to deal with the problem. The first method is the parameterization of instance selection algorithms, which allows to choose how aggressively the instances are removed and the second one is to embed the instance selection directly into the prediction model, which in our case is an MLP neural network.

[1]  Markus Hofmann,et al.  RapidMiner: Data Mining Use Cases and Business Analytics Applications , 2013 .

[2]  Antonio González Muñoz,et al.  Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective , 2015, Pattern Recognit..

[3]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[4]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[7]  Michela Antonelli,et al.  Genetic Training Instance Selection in Multiobjective Evolutionary Fuzzy Systems: A Coevolutionary Approach , 2012, IEEE Transactions on Fuzzy Systems.

[8]  Marek Grochowski,et al.  Comparison of Instance Selection Algorithms II. Results and Comments , 2004, ICAISC.

[9]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[10]  Elena Marchiori,et al.  Class Conditional Nearest Neighbor for Large Margin Instance Selection , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Marcin Blachnik,et al.  Simplifying SVM with Weighted LVQ Algorithm , 2011, IDEAL.

[13]  Zoran Stajic,et al.  A methodology for training set instance selection using mutual information in time series prediction , 2014, Neurocomputing.

[14]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[15]  Philip K. Chan,et al.  An Analysis of Instance Selection for Neural Networks to Improve Training Speed , 2014, 2014 13th International Conference on Machine Learning and Applications.

[16]  Wlodzislaw Duch,et al.  Variable step search algorithm for feedforward networks , 2008, Neurocomputing.

[17]  Mohammad Shokouhifar,et al.  Improving the Performance of Artificial Neural Networks via Instance Selection and Feature Dimensionality Reduction , 2013 .

[18]  Ginés Rubio,et al.  New method for instance or prototype selection using mutual information in time series prediction , 2010, Neurocomputing.

[19]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[20]  Mirosław Kordos,et al.  A survey of factors influencing MLP error surface , 2004 .

[21]  Khalid M. Salama,et al.  Instance Selection with Ant Colony Optimization , 2015, INNS Conference on Big Data.

[22]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .