Using Growing Neural Gas in Prototype Generation for Nearest Neighbor Classifiers

Instance-based learning algorithms, such as nearest neighbor NN classifiers, require storing all training instances and consulting them when making predictions. One alternative to overcome these costs is to reduce the learning dataset by a pre-processing step. This work deals with prototype generation, where new data points are generated from the original dataset. Reduction can be achieved by retaining less instances in the most representative areas of the dataset, which are represented by prototypes. Here Growing Neural Gas Networks are employed for generating the prototype instances. Experimentally, NN classifiers using the reduced datasets were able to maintain close accuracy to that of NN classifiers using the whole dataset.

[1]  Huan Liu,et al.  Instance Selection and Construction for Data Mining , 2001 .

[2]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[3]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[4]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[5]  Jack Koplowitz,et al.  On the relation of performance to editing in nearest neighbor rules , 1981, Pattern Recognit..

[6]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[8]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[9]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[11]  José Salvador Sánchez,et al.  High training set size reduction by space partitioning and prototype abstraction , 2004, Pattern Recognit..

[12]  Francesc J. Ferri,et al.  A merge-based condensing strategy for multiple prototype classifiers , 2002, IEEE Trans. Syst. Man Cybern. Part B.