O (m log m) instance selection algorithms—RR-DROPs

This paper is focused on an instance selection algorithm for classification purposes. We propose a new fast version of DROP algorithms with complexity reduced to O(m log m), while the original complexity was O(m3). The new RR-DROP algorithms use random region hashing forests and jungle, and several other data structures to keep the computational complexity as low as possible. The proposed algorithms can be used for huge datasets, with classification remaining unchanged, as proven by a statistical analysis on several datasets.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[3]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Wlodzislaw Duch,et al.  LVQ algorithm with instance weighting for generation of prototype-based rules , 2011, Neural Networks.

[5]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[6]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[7]  Norbert Jankowski,et al.  Fast Algorithm for Prototypes Selection - Trust-Margin Prototypes , 2019, ICAISC.

[8]  José Francisco Martínez Trinidad,et al.  A new fast prototype selection method based on clustering , 2010, Pattern Analysis and Applications.

[9]  Juan José Rodríguez Diez,et al.  Instance selection of linear complexity for big data , 2016, Knowl. Based Syst..

[10]  Miroslaw Kordos Optimization of Evolutionary Instance Selection , 2017, ICAISC.

[11]  S. Canu,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[12]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[13]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[14]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[15]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[16]  Russell A. Brown,et al.  Building a Balanced k-d Tree in O(kn log n) Time , 2014, ArXiv.

[17]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[18]  Rm Cameron-Jones,et al.  Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing , 1995 .

[19]  Norbert Jankowski,et al.  Comparison of Prototype Selection Algorithms Used in Construction of Neural Networks Learned by SVD , 2018, Int. J. Appl. Math. Comput. Sci..