GA-based feature subset clustering for combination of multiple nearest neighbors classifiers

Nearest neighbor classifier (NNC) is stable to the change of the training data set while sensitive to the variation of the feature set. The combination of multiple NNCs on different subsets of features may outperform the standard NNC. In this paper, we develop a new method called FC-MNNC based on feature subset clustering for combining multiple NNCs to obtain better performance than that using a single NNC. In this method, GA is used for clustering features to form different feature subsets according to the combination classification accuracy. Multiple NNCs based on the corresponding feature subsets are parallel and independent to classify one pattern. The final decision is aggregated by majority voting rule, which is a simple and efficient technique. To demonstrate the performance of FC-MNNC, we select four UCI databases in our experiments. The proposed FC-MNNC is compared with (i) standard NNC, (ii) feature selection using GA in individual NNC and (iii) feature subset selection using GA in multiple NNCs. The experimental results show that the accuracy of FC-MNNC is better than that of the standard NNC and feature selection using GA in individual classifier. The performance of FC-MNNC is not worse than that of feature subset selection using GA in multiple NNCs. It is also demonstrated that FC-MNNC is robust to irrelevant features.

[1]  Stephen D. Bay Nearest neighbor classification from multiple feature subsets , 1999, Intell. Data Anal..

[2]  Ching Y. Suen,et al.  Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[3]  M. Narasimha Murty,et al.  Fusion of multiple approximate nearest neighbor classifiers for fast and efficient classification , 2004, Inf. Fusion.

[4]  Pat Langley,et al.  Average-Case Analysis of a Nearest Neighbor Algorithm , 1993, IJCAI.

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  Lakhmi C. Jain,et al.  Designing classifier fusion systems by genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[7]  David B. Skalak,et al.  Prototype Selection for Composite Nearest Neighbor Classifiers , 1995 .

[8]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[9]  Yoshihiko Hamamoto,et al.  A Bootstrap Technique for Nearest Neighbor Classifier Design , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ethem Alpaydin,et al.  Voting over Multiple Condensed Nearest Neighbors , 1997, Artificial Intelligence Review.

[12]  P. Langley,et al.  Average-case analysis of a nearest neighbor algorthim , 1993, IJCAI 1993.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.