Ranking-based instance selection for pattern classification

Abstract In instance-based learning algorithms, the need to store a large number of examples as the training set results in several drawbacks related to large memory requirements, oversensitivity to noise, and slow execution speed. Instance selection techniques can improve the performance of these algorithms by selecting the best instances from the original data set, removing, for example, redundant information and noisy points. The relationship between an instance and the other patterns in the training set plays an important role and can impact its misclassification by learning algorithms. Such a relationship can be represented as a value that measures how difficult such instance is regarding classification purposes. Based on that, we introduce a novel instance selection algorithm called Ranking-based Instance Selection (RIS) that attributes a score per instance that depends on its relationship with all other instances in the training set. In this sense, instances with higher scores form safe regions (neighborhood of samples with relatively homogeneous class labels) in the feature space, and instances with lower scores form an indecision region (borderline samples of different classes). This information is further used in a selection process to remove instances from both safe and indecision regions that are considered irrelevant to represent their clusters in the feature space. In contrast to previous algorithms, the proposal combines a raking procedure with a selection process aiming to find a promising tradeoff between accuracy and reduction rate. Experiments are conducted on twenty-four real-world classification problems and show the effectiveness of the RIS algorithm when compared against other instance selection algorithms in the literature.

[1]  B. John Oommen,et al.  A brief taxonomy and ranking of creative prototype reduction schemes , 2003, Pattern Analysis & Applications.

[2]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[3]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Chih-Fong Tsai,et al.  Under-sampling class imbalanced datasets by combining clustering analysis and instance selection , 2019, Inf. Sci..

[5]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[6]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[7]  R. Tibshirani,et al.  Prototype selection for interpretable classification , 2011, 1202.5933.

[8]  Juan José Rodríguez Diez,et al.  Study of data transformation techniques for adapting single-label prototype selection algorithms to multi-label learning , 2018, Expert Syst. Appl..

[9]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Kyoung-jae Kim Artificial neural networks with evolutionary instance selection for financial forecasting , 2006, Expert Syst. Appl..

[11]  George D. C. Cavalcanti,et al.  FIRE-DES++: Enhanced Online Pruning of Base Classifiers for Dynamic Ensemble Selection , 2018, Pattern Recognit..

[12]  George D. C. Cavalcanti,et al.  Instance selection algorithm based on a Ranking Procedure , 2011, The 2011 International Joint Conference on Neural Networks.

[13]  Chih-Fong Tsai,et al.  Evolutionary feature and instance selection for traffic sign recognition , 2015, Comput. Ind..

[14]  George D. C. Cavalcanti,et al.  Handwritten connected digits detection: An approach using instance selection , 2011, 2011 18th IEEE International Conference on Image Processing.

[15]  George D. C. Cavalcanti,et al.  Dynamic classifier selection: Recent advances and perspectives , 2018, Inf. Fusion.

[16]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[17]  George D. C. Cavalcanti,et al.  Prototype selection for dynamic classifier and ensemble selection , 2016, Neural Computing and Applications.

[18]  Francisco Herrera,et al.  MRPR: A MapReduce solution for prototype reduction in big data classification , 2015, Neurocomputing.

[19]  George D. C. Cavalcanti,et al.  Analyzing different prototype selection techniques for dynamic classifier and ensemble selection , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[20]  George D. C. Cavalcanti,et al.  Choosing instance selection method using meta-learning , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[21]  Francesca Mangili,et al.  Should We Really Use Post-Hoc Tests Based on Mean-Ranks? , 2015, J. Mach. Learn. Res..

[22]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[23]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[24]  Tony R. Martinez,et al.  An instance level analysis of data complexity , 2014, Machine Learning.

[25]  Kate Smith-Miles,et al.  Meta-learning for data summarization based on instance selection method , 2010, IEEE Congress on Evolutionary Computation.

[26]  George D. C. Cavalcanti,et al.  ATISA: Adaptive Threshold-based Instance Selection Algorithm , 2013, Expert Syst. Appl..

[27]  Juan José Rodríguez Diez,et al.  Instance selection for regression by discretization , 2016, Expert Syst. Appl..

[28]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[29]  Francisco Herrera,et al.  Instance reduction for one-class classification , 2018, Knowledge and Information Systems.

[30]  Ludmila I. Kuncheva,et al.  Instance selection improves geometric mean accuracy: a study on imbalanced data classification , 2018, Progress in Artificial Intelligence.

[31]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.