Sample Set Reduction Method Based on Neighborhood Non-Dominated Crowding-Distance Sorting

To remove the redundant samples and reduce sample set size of big data analysis while ensuring classification accuracy, a sample set reduction method based on Neighborhood Non-dominated Crowding-distance Sorting (NNCS) is proposed. The proposed method is based on the idea of neighborhood, in which the distance between the samples of one class and the other classes is taken as the evaluation criterion. Furthermore, optimization algorithm NSGA-II is adapted to select the key samples by non-dominated crowding-distance sorting and the purpose of reducing the sample set is achieved finally. The experimental results on nine UCI standard data sets show that the method has good generalization performance and reduces the sample set size effectively under the premise of fully guaranteed classification accuracy.

[1]  Wang Jinbo,et al.  Research on new reduction strategy of SVM used to large-scale training sample set , 2011, Proceedings of 2011 International Conference on Electronics and Optoelectronics.

[2]  S. Grabowski,et al.  Sample set reduction for nearest neighbor classifiers under different speed requirements , 2003, The Experience of Designing and Application of CAD Systems in Microelectronics, 2003. CADSM 2003. Proceedings of the 7th International Conference..

[3]  Fang Zhu,et al.  A New SVM Reduction Strategy of Large-Scale Training Sample Sets , 2013 .

[4]  G. Krishna,et al.  The condensed nearest neighbor rule using the concept of mutual nearest neighborhood (Corresp.) , 1979, IEEE Trans. Inf. Theory.

[5]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[6]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[7]  Michael R. Berthold,et al.  Active learning for object classification: from exploration to exploitation , 2009, Data Mining and Knowledge Discovery.

[8]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[9]  Kalyanmoy Deb,et al.  Introducing Robustness in Multi-Objective Optimization , 2006, Evolutionary Computation.

[10]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[11]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[12]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.