Effect of the distance functions on the distance-based instance selection for the feed-forward neural network

The distance-based instance selection algorithm for the feed-forward neural network is a numerosity reduction technique. It selects only the instances at the decision boundary between consecutive classes of data to lessen the number of instances in the original training set using the Euclidean distance function. This paper studies improvement of the reduction performance and the classification performance of the nine distance functions in the distance-based instance selection algorithm. The evaluation was conducted on the real-world data sets from the UCI machine learning repository and ELENA project. The data reduction performance results confirmed that the Chebyshev, Cosine, and Minkowski distance functions are recommended for the integer data type. The Minkowski distance function is recommended for the categorical data type. The Jaccard distance function is recommended for the real data type. The selection of the initial distance function based on the data type can make the distance-based instance selection algorithm produce the best classification performance.

[1]  Ivo D. Dinov,et al.  Data Science and Predictive Analytics , 2018, Springer International Publishing.

[2]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[5]  Greg Van Houdt,et al.  A review on the long short-term memory model , 2020, Artificial Intelligence Review.

[6]  Elisa Bertino Introduction to Data Science and Engineering , 2016, Data Science and Engineering.

[7]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[8]  Tom Fawcett,et al.  Data science for business , 2013 .

[9]  Aapo Hyvärinen,et al.  Independent component analysis: recent advances , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[10]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[11]  Khalid M. Salama,et al.  Instance Selection with Ant Colony Optimization , 2015, INNS Conference on Big Data.

[12]  Belur V. Dasarathy,et al.  Nearest Neighbour Editing and Condensing Tools–Synergy Exploitation , 2000, Pattern Analysis & Applications.

[13]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[14]  Fuad E. Alsaadi,et al.  A competitive mechanism integrated multi-objective whale optimization algorithm with differential evolution , 2021, Neurocomputing.

[15]  Hugh B. Woodruff,et al.  An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[16]  S. Dumais Latent Semantic Analysis. , 2005 .

[17]  Hideitsu Hino,et al.  A Conditional Entropy Minimization Criterion for Dimensionality Reduction and Multiple Kernel Learning , 2010, Neural Computation.

[18]  Ian H. Witten,et al.  Chapter 1 – What's It All About? , 2011 .

[19]  Jian Pei,et al.  A spatiotemporal compression based approach for efficient big data processing on Cloud , 2014, J. Comput. Syst. Sci..

[20]  Piyabute Fuangkhon Parallel Distance-Based Instance Selection Algorithm for Feed-Forward Neural Network , 2017, J. Intell. Syst..

[21]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[22]  Zidong Wang,et al.  A Dynamic Neighborhood-Based Switching Particle Swarm Optimization Algorithm , 2020, IEEE Transactions on Cybernetics.

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[24]  A. H. Robinson,et al.  Results of a prototype television bandwidth compression scheme , 1967 .

[25]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[26]  Prem Prakash Jayaraman,et al.  Big Data Reduction Methods: A Survey , 2016, Data Science and Engineering.

[27]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[28]  Grzegorz Dudek,et al.  An Artificial Immune System for Classification With Local Feature Selection , 2012, IEEE Transactions on Evolutionary Computation.

[29]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[30]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[31]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[32]  Piyabute Fuangkhon A Study of the Normalization Functions on a Distance-Based Instance Selection: A Data Reduction Technique , 2018 .

[33]  H. Andrews,et al.  Singular Value Decomposition (SVD) Image Coding , 1976, IEEE Trans. Commun..

[34]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[35]  Muhammad Kashif Hanif,et al.  Overview and comparative study of dimensionality reduction techniques for high dimensional data , 2020, Inf. Fusion.

[36]  Qiang Shen,et al.  A Distance Measure Approach to Exploring the Rough Set Boundary Region for Attribute Reduction , 2010, IEEE Transactions on Knowledge and Data Engineering.

[37]  Nico Vervliet,et al.  Breaking the Curse of Dimensionality Using Decompositions of Incomplete Tensors: Tensor-based scientific computing in big data analysis , 2014, IEEE Signal Processing Magazine.

[38]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[39]  Hadi Sadoghi Yazdi,et al.  IRAHC: Instance Reduction Algorithm using Hyperrectangle Clustering , 2015, Pattern Recognit..

[40]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[41]  Piyabute Fuangkhon An incremental learning preprocessor for feed-forward neural network , 2011, Artificial Intelligence Review.

[42]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[43]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[44]  Nikhil R. Pal,et al.  Fuzzy logic approaches to structure preserving dimensionality reduction , 2002, IEEE Trans. Fuzzy Syst..

[45]  Simon C. K. Shiu,et al.  Combining feature reduction and case selection in building CBR classifiers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[46]  Mara Abel,et al.  Efficient Instance Selection Based on Spatial Abstraction , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[47]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[48]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[49]  Guangming Lu,et al.  Highly shared Convolutional Neural Networks , 2021, Expert Syst. Appl..

[50]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[51]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[52]  Piyabute Fuangkhon,et al.  A training set reduction algorithm for feed-forward neural network using minimum boundary vector distance selection , 2014, 2014 International Conference on Information Science, Electronics and Electrical Engineering.

[53]  Fuad E. Alsaadi,et al.  Deep-reinforcement-learning-based images segmentation for quantitative analysis of gold immunochromatographic strip , 2020, Neurocomputing.

[54]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[55]  Antonio González Muñoz,et al.  Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective , 2015, Pattern Recognit..

[56]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[57]  Amine Chikh,et al.  Instances selection algorithm by ensemble margin , 2017, J. Exp. Theor. Artif. Intell..

[58]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.