Prototype generation on structural data using dissimilarity space representation

Data reduction techniques play a key role in instance-based classification to lower the amount of data to be processed. Among the different existing approaches, prototype selection (PS) and prototype generation (PG) are the most representative ones. These two families differ in the way the reduced set is obtained from the initial one: While the former aims at selecting the most representative elements from the set, the latter creates new data out of it. Although PG is considered to delimit more efficiently decision boundaries, the operations required are not so well defined in scenarios involving structural data such as strings, trees, or graphs. This work studies the possibility of using dissimilarity space (DS) methods as an intermediate process for mapping the initial structural representation to a statistical one, thereby allowing the use of PG methods. A comparative experiment over string data is carried out in which our proposal is faced to PS methods on the original space. Results show that the proposed strategy is able to achieve significantly similar results to PS in the initial space, thus standing as a clear alternative to the classic approach, with some additional advantages derived from the DS representation.

[1]  Juan Ramón Rico-Juan,et al.  Improving kNN multi-label classification in Prototype Selection scenarios using class proposals , 2015, Pattern Recognit..

[2]  Loris Nanni,et al.  Prototype reduction techniques: A comparison among different approaches , 2011, Expert Syst. Appl..

[3]  Francisco Herrera,et al.  On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining , 2006, Appl. Soft Comput..

[4]  Luisa Micó,et al.  Which Fast Nearest Neighbour Search Algorithm to Use? , 2013, IbPRIA.

[5]  Juan Ramón Rico-Juan,et al.  A new iterative algorithm for computing a quality approximate median of strings based on edit operations , 2014, Pattern Recognit. Lett..

[6]  Hanan Samet,et al.  Properties of Embedding Methods for Similarity Searching in Metric Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[9]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[10]  Juan Ramón Rico-Juan,et al.  Prototype generation on structural data using dissimilarity space representation , 2015, Neural Computing and Applications.

[11]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[12]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Robert P. W. Duin,et al.  The dissimilarity space: Bridging structural and statistical pattern recognition , 2012, Pattern Recognit. Lett..

[14]  Fabrizio Angiulli,et al.  Fast Nearest Neighbor Condensation for Large Data Sets Classification , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[16]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  José Salvador Sánchez,et al.  High training set size reduction by space partitioning and prototype abstraction , 2004, Pattern Recognit..

[18]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[19]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[21]  Massimo Piccardi,et al.  Discriminative prototype selection methods for graph embedding , 2013, Pattern Recognit..

[22]  William Eberle,et al.  Genetic algorithms in feature and instance selection , 2013, Knowl. Based Syst..

[23]  Christine Decaestecker,et al.  Finding prototypes for nearest neighbour classification by means of gradient descent and deterministic annealing , 1997, Pattern Recognit..

[24]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Francisco Casacuberta,et al.  Topology of Strings: Median String is NP-Complete , 1999, Theor. Comput. Sci..

[26]  Juan Ramón Rico-Juan,et al.  New rank methods for reducing the size of the training set using the nearest neighbor rule , 2012, Pattern Recognit. Lett..

[27]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[28]  Horst Bunke,et al.  An Iterative Algorithm for Approximate Median Graph Computation , 2010, 2010 20th International Conference on Pattern Recognition.

[29]  Fernando Fernández,et al.  Evolutionary Design of Nearest Prototype Classifiers , 2004, J. Heuristics.

[30]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[31]  Nicolás García-Pedrajas,et al.  Boosting instance selection algorithms , 2014, Knowl. Based Syst..

[32]  Kaspar Riesen,et al.  Towards the unification of structural and statistical pattern recognition , 2012, Pattern Recognit. Lett..

[33]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[34]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[35]  David G. Stork,et al.  Pattern Classification , 1973 .

[36]  José Oncina,et al.  Recognition of Pen-Based Music Notation: The HOMUS Dataset , 2014, 2014 22nd International Conference on Pattern Recognition.

[37]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[38]  Ulrich Eckhardt,et al.  Shape descriptors for non-rigid shapes with a single closed contour , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[39]  Herbert Freeman,et al.  On the Encoding of Arbitrary Geometric Configurations , 1961, IRE Trans. Electron. Comput..