A model to estimate the Self-Organizing Maps grid dimension for Prototype Generation

Due to the high accuracy of the K nearest neighbor algorithm in different problems, KNN is one of the most important classifiers used in data mining applications and is recognized in the literature as a benchmark algorithm. Despite its high accuracy, KNN has some weaknesses, such as the time taken by the classification process, which is a disadvantage in many problems, particularly in those that involve a large dataset. The literature presents some approaches to reduce the classification time of KNN by selecting only the most important dataset examples. One of these methods is called Prototype Generation (PG) and the idea is to represent the dataset examples in prototypes. Thus, the classification process occurs in two steps; the first is based on prototypes and the second on the examples represented by the nearest prototypes. The main problem of this approach is a lack of definition about the ideal number of prototypes. This study proposes a model that allows the best grid dimension of Self-Organizing Maps and the ideal number of prototypes to be estimated using the number of dataset examples as a parameter. The approach is contrasted with other PG methods from the literature based on artificial intelligence that propose to automatically define the number of prototypes. The main advantage of the proposed method tested here using eighteen public datasets is that it allows a better relationship between a reduced number of prototypes and accuracy, providing a sufficient number that does not degrade KNN classification performance.

[1]  Utpal Garain,et al.  Prototype reduction using an artificial immune model , 2008, Pattern Analysis and Applications.

[2]  Sérgio Shiguemi Furuie,et al.  Combining wavelets transform and Hu moments with self-organizing maps for medical image categorization , 2011, J. Electronic Imaging.

[3]  George D. C. Cavalcanti,et al.  Prototype selection for dynamic classifier and ensemble selection , 2016, Neural Computing and Applications.

[4]  Loris Nanni,et al.  Particle swarm optimization for prototype reduction , 2009, Neurocomputing.

[5]  B. John Oommen,et al.  A brief taxonomy and ranking of creative prototype reduction schemes , 2003, Pattern Analysis & Applications.

[6]  Hugo Jair Escalante,et al.  PGGP: Prototype Generation via Genetic Programming , 2016, Appl. Soft Comput..

[7]  Leandro A. Silva,et al.  Prototype Generation Using Self-Organizing Maps for Informativeness-Based Classifier , 2017, Comput. Intell. Neurosci..

[8]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[9]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[10]  Shichao Zhang,et al.  Efficient kNN classification algorithm for big data , 2016, Neurocomputing.

[11]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[12]  Zaher Al Aghbari,et al.  Fast k-NN Image Search with Self-Organizing Maps , 2002, CIVR.

[13]  Emilio Del-Moral-Hernandez,et al.  A SOM combined with KNN for classification task , 2011, The 2011 International Joint Conference on Neural Networks.

[14]  Gilles Faÿ,et al.  Características inmunológicas claves en la fisiopatología de la sepsis. Infectio , 2009 .

[15]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Juan Ramón Rico-Juan,et al.  Prototype generation on structural data using dissimilarity space representation , 2017, Neural Computing and Applications.

[17]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[18]  Shixiong Xia,et al.  An Improved KNN Text Classification Algorithm Based on Clustering , 2009, J. Comput..

[19]  Jorma Laaksonen,et al.  Optimal Combination of SOM Search in Best-Matching Units and Map Neighborhood , 2009, WSOM.

[20]  Jia-Chian Chen,et al.  A fast prototype reduction method based on template reduction and visualization-induced self-organizing map for nearest neighbor algorithm , 2013, Applied Intelligence.

[21]  C. H. Chen,et al.  A sample set condensation algorithm for the class sensitive artificial neural network , 1996, Pattern Recognit. Lett..

[22]  Amir F. Atiya,et al.  Self-generating prototypes for pattern classification , 2007, Pattern Recognit..

[23]  Filiberto Pla,et al.  Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces , 2006, Pattern Recognit..

[24]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[25]  María José del Jesús,et al.  KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining , 2017, Int. J. Comput. Intell. Syst..

[26]  Edson C. Kitani,et al.  Fine-tuning of the SOMkNN classifier , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[27]  Francisco Herrera,et al.  IPADE: Iterative Prototype Adjustment for Nearest Neighbor Classification , 2010, IEEE Transactions on Neural Networks.

[28]  Michael T. Manry,et al.  Prototype Classifier Design with Pruning , 2005, Int. J. Artif. Intell. Tools.

[29]  Teuvo Kohonen,et al.  Essentials of the self-organizing map , 2013, Neural Networks.