AIB2: an abstraction data reduction technique based on IB2

Data reduction improves the efficiency of k-NN classifier on large datasets since it accelerates the classification process and reduces storage requirements for the training data. IB2 is an effective data reduction technique that selects some training items form the initial dataset and uses them as representatives (prototypes). Contrary to many other data reduction techniques, IB2 is a very fast, one-pass method that builds its reduced (condensing) set in an incremental manner. New training data can update the condensing set without the need of the "old" removed items. This paper proposes AIB2, a variation of IB2, which generates new prototypes instead of selecting them. AIB2 attempts to improve the efficiency of IB2 by positioning the prototypes in the center of the data areas they represent. The experimental study shows that AIB2 performs better than IB2.

[1]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[2]  Hanan Samet,et al.  Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling) , 2005 .

[3]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[4]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[6]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[7]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[9]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[10]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[13]  Charu C. Aggarwal,et al.  Data Streams: Models and Algorithms (Advances in Database Systems) , 2006 .

[14]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[15]  Marek Grochowski,et al.  Comparison of Instance Selection Algorithms II. Results and Comments , 2004, ICAISC.

[16]  Eyke Hüllermeier,et al.  Efficient instance-based learning on data streams , 2007, Intell. Data Anal..

[17]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[18]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .