An Approach to Instance Reduction in Supervised Learning

The paper proposes a set of simple heuristic algorithms for instance reduction problem. Proposed algorithms can be used to increase efficiency of supervised learning. A reduced training set consisting of selected instances is used as an input for the machine-learning algorithm. This may result in reducing time needed for learning or increasing learning quality or both. The paper presents a collection of four algorithms, which are used to reduce the size of a training set. The algorithms are based on calculating for each instance in the original training set the value of its similarity coefficient. Values of the coefficient are used to group instances into clusters. Out of each cluster only a limited number of instances is selected to form a reduced training set. One of the proposed algorithms uses population-learning algorithm for selection of instances. The approach has been validated by means of computational experiment.

[1]  Piotr Jędrzejowicz,et al.  Social learning algorithm as a tool for solving some difficult scheduling problems , 1999 .

[2]  Piotr Jędrzejowicz,et al.  An Approach to Artificial Neural Network Training , 2003 .

[3]  Wlodzislaw Duch,et al.  SBL-PM: A Simple Algorithm for Selection of Reference Instances for Similarity Based Methods , 2000, Intelligent Information Systems.

[4]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[5]  Steven Salzberg,et al.  A Nearest Hyperrectangle Learning Method , 1991, Machine Learning.

[6]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[7]  Piotr Jędrzejowicz,et al.  Population Learning Metaheuristic for Neural Network Training , 2003 .

[8]  Sakir Kocabas Conflict Resolution as Discovery in Particle Physics , 2005, Machine Learning.

[9]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[10]  Piotr Jedrzejowicz,et al.  An Instance Reduction Algorithm for Supervised Learning , 2003, IIS.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Kotagiri Ramamohanarao,et al.  Instance-Based Classification by Emerging Patterns , 2000, PKDD.

[13]  Luisa Micó,et al.  A Fast Approximated k-Median Algorithm , 2002, SSPR/SPR.

[14]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[15]  Piotr Jędrzejowicz,et al.  Population Learning Algorithm - Example Implementations and Experiments , 2001 .

[16]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..