IRAHC: Instance Reduction Algorithm using Hyperrectangle Clustering

Abstract In instance-based classifiers, there is a need for storing a large number of samples as training set. In this work, we propose an instance reduction method based on hyperrectangle clustering, called Instance Reduction Algorithm using Hyperrectangle Clustering (IRAHC). IRAHC removes non-border (interior) instances and keeps border and near border ones. This paper presents an instance reduction process based on hyperrectangle clustering. A hyperrectangle is an n-dimensional rectangle with axes aligned sides, which is defined by min and max points and a corresponding distance function. The min–max points are determined by using the hyperrectangle clustering algorithm. Instance-based learning algorithms are often confronted with the problem of deciding which instances must be stored to be used during an actual test. Storing too many instances can result in a large memory requirements and a slow execution speed. In IRAHC, core of instance reduction process is based on set of hyperrectangles. The performance has been evaluated on real world data sets from UCI repository by the 10-fold cross-validation method. The results of the experiments have been compared with state-of-the-art methods, which show superiority of the proposed method in terms of classification accuracy and reduction percentage.

[1]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[2]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[3]  Colin R. Reeves,et al.  Using Genetic Algorithms for Training Data Selection in RBF Networks , 2001 .

[4]  David Zhang,et al.  On kernel difference-weighted k-nearest neighbor classification , 2007, Pattern Analysis and Applications.

[5]  Maria do Carmo Nicoletti,et al.  A version of the NGE model suitable for fuzzy domains , 2007, J. Intell. Fuzzy Syst..

[6]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ireneusz Czarnowski Cluster-based instance selection for machine classification , 2010, Knowledge and Information Systems.

[8]  Dietrich Wettschereck,et al.  A Hybrid Nearest-Neighbor and Nearest-Hyperrectangle Algorithm , 1994, ECML.

[9]  Chien-Hsing Chou,et al.  The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[10]  Ludmila I. Kuncheva,et al.  Fitness functions in editing k-NN reference set by genetic algorithms , 1997, Pattern Recognit..

[11]  Francisco Herrera,et al.  Evolutionary selection of hyperrectangles in nested generalized exemplar learning , 2011, Appl. Soft Comput..

[12]  Yang Xiang,et al.  Summarizing transactional databases with overlapped hyperrectangles , 2011, Data Mining and Knowledge Discovery.

[13]  Cor J. Veenman,et al.  The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Francisco Herrera,et al.  A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..

[15]  Ludmila I. Kuncheva,et al.  Editing for the k-nearest neighbors rule by a genetic algorithm , 1995, Pattern Recognit. Lett..

[16]  José Francisco Martínez Trinidad,et al.  A new fast prototype selection method based on clustering , 2010, Pattern Analysis and Applications.

[17]  Elena Marchiori,et al.  Class Conditional Nearest Neighbor for Large Margin Instance Selection , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Chidchanok Lursinsap,et al.  A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm , 2005, Pattern Recognit. Lett..

[19]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[20]  Marie Chavent,et al.  Normalized k-means clustering of hyper-rectangles , 2005 .

[21]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[22]  Hadi Sadoghi Yazdi,et al.  DDC: distance-based decision classifier , 2011, Neural Computing and Applications.

[23]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[24]  Utpal Garain,et al.  Prototype reduction using an artificial immune model , 2008, Pattern Analysis and Applications.

[25]  Estevam R. Hruschka,et al.  Automatic Construction of Fuzzy Rule Bases: a further Investigation into two Alternative Inductive Approaches , 2008, J. Univers. Comput. Sci..

[26]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[27]  Martin Ester,et al.  Right of Inference: Nearest Rectangle Learning Revisited , 2006, ECML.

[28]  Steven Salzberg,et al.  A Nearest Hyperrectangle Learning Method , 1991, Machine Learning.

[29]  Thomas G. Dietterich,et al.  An Experimental Comparison of the Nearest-Neighbor and Nearest-Hyperrectangle Algorithms , 1995, Machine Learning.

[30]  Elena Marchiori,et al.  Hit Miss Networks with Applications to Instance Selection , 2008, J. Mach. Learn. Res..

[31]  Juan J. Navarro,et al.  Exploiting computer resources for fast nearest neighbor classification , 2007, Pattern Analysis and Applications.

[32]  Simon Kasif,et al.  Learning Nested Concept Classes with Limited Storage , 1991, IJCAI.

[33]  Q. Henry Wu,et al.  A class boundary preserving algorithm for data condensation , 2011, Pattern Recognit..

[34]  Maria do Carmo Nicoletti,et al.  Constructive Neural Network Algorithms for Feedforward Architectures Suitable for Classification Tasks , 2009, Constructive Neural Networks.

[35]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[36]  Loris Nanni,et al.  A clustering method for automatic biometric template selection , 2006, Pattern Recognit..

[37]  Maria do Carmo Nicoletti,et al.  Evaluating the effects of distance metrics on a NGE-based system , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[38]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[39]  Wai Lam,et al.  Discovering Useful Concept Prototypes for Classification Based on Filtering and Abstraction , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Amir F. Atiya,et al.  A Novel Template Reduction Approach for the $K$-Nearest Neighbor Method , 2009, IEEE Transactions on Neural Networks.

[41]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[42]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[43]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[44]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.