The Data Selection Criteria for HSC and SVM Algorithms

This paper makes a discussion of consistent subsets (CS) selection criteria for hyper surface Classification (HSC) and SVM algorithms. The consistent subsets play an important role in the data selection. Firstly, the paper proposes that minimal consistent subset for a disjoint cover set (MCSC) plays an important role in the data selection for HSC. The MCSC can be applied to select a representative subset from the original sample set for HSC. MCSC has the same classification model with the entire sample set and can totally reflect its classification ability. Secondly, the number of MCSC is calculated. Thirdly, by comparing the performance of HSC and SVM on corresponding CS, we argue that it is not reasonable that using the same train data set to train different classifiers and then testing the classifiers by the same test data set for different algorithms. The experiments show that algorithms can respectively select the proper data set for training, which ensures good performance and generalization ability. MCSC is the best selection for HSC, and support vector set is the effective selection for SVM.

[1]  Zhong-Zhi Shi,et al.  Enhanced Algorithm Performance for Classification Based on Hyper Surface using Bagging and Adaboost , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[2]  Jianxin Wu,et al.  Genetic Algorithm based Selective Neural Network Ensemble , 2001, IJCAI.

[3]  Hongbin Zhang,et al.  Optimal reference subset selection for nearest neighbor classification by tabu search , 2002, Pattern Recognit..

[4]  Vicente Cerverón,et al.  Parallel Random Search and Tabu Search for the Minimal Consistent Subset Selection Problem , 1998, RANDOM.

[5]  Qing He,et al.  Classification based on dimension transposition for high dimension data , 2007, Soft Comput..

[6]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[7]  Qing He,et al.  The classification method based on hyper surface , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[8]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[9]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[10]  Belur V. Dasarathy,et al.  Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design , 1994, IEEE Trans. Syst. Man Cybern..

[11]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[12]  Ludmila I. Kuncheva,et al.  Fitness functions in editing k-NN reference set by genetic algorithms , 1997, Pattern Recognit..

[13]  Zhongzhi Shi,et al.  A novel classification method based on hypersurface , 2003 .

[14]  Thomas P. Minka,et al.  Gates , 2008, NIPS.

[15]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[16]  James C. Bezdek,et al.  Nearest prototype classification: clustering, genetic algorithms, or random search? , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[17]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[18]  C. W. Swonger SAMPLE SET CONDENSATION FOR A CONDENSED NEAREST NEIGHBOR DECISION RULE FOR PATTERN RECOGNITION , 1972 .

[19]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[20]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.