Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string

Abstract Different pattern recognition techniques such as clustering, k-nearest neighbors classification or instance reduction algorithms require prototypes to represent pattern classes. In many applications, strings are used to encode instances, for example, in contour representations or in biological data such as DNA, RNA and protein sequences. Median strings have been used as representatives of a set of strings in different domains. Finding the median string is an NP-Complete problem for several formulations. Alternatively, heuristic approaches that iteratively refine an initial coarse solution by applying edit operations have been proposed. We propose here a novel algorithm that outperforms state of the art heuristic approximations to the median string in terms of convergence speed by estimating the effect of a perturbation in the minimization of the expressions that define the median strings. We present comparative experiments to validate these results.

[1]  Horst Bunke,et al.  Optimal Lower Bound for Generalized Median Problems in Metric Space , 2002, SSPR/SPR.

[2]  Francisco Casacuberta,et al.  Median strings for k-nearest neighbour classification , 2003, Pattern Recognit. Lett..

[3]  Anil K. Jain,et al.  Representation and Recognition of Handwritten Digits Using Deformable Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Igor Fischer,et al.  String Averages and Self-Organizing Maps for Strings , 2000 .

[5]  Teuvo Kohonen,et al.  Median strings , 1985, Pattern Recognit. Lett..

[6]  Morihiro Hayashida,et al.  Finding Median and Center Strings for a Probability Distribution on a Set of Strings Under Levenshtein Distance Based on Integer Linear Programming , 2016, BIOSTEC.

[7]  Abraham Kandel,et al.  On the Weighted Mean of a Pair of Strings , 2002, Pattern Analysis & Applications.

[8]  Juan Ramón Rico-Juan,et al.  A new iterative algorithm for computing a quality approximate median of strings based on edit operations , 2014, Pattern Recognit. Lett..

[9]  Xiaoyi Jiang,et al.  Generalized median string computation by means of string embedding in vector spaces , 2012, Pattern Recognit. Lett..

[10]  Atsushi Imiya,et al.  Structural, Syntactic, and Statistical Pattern Recognition , 2012, Lecture Notes in Computer Science.

[11]  François Fouss,et al.  A sum-over-paths extension of edit distances accounting for all sequence alignments , 2011, Pattern Recognit..

[12]  Morihiro Hayashida,et al.  Integer Linear Programming Approach to Median and Center Strings for a Probability Distribution on a Set of Strings , 2016, BIOINFORMATICS.

[13]  Carlos David Martínez Hinarejos La cadena media y su aplicación en reconocimiento de formas , 2003 .

[14]  Francisco Casacuberta,et al.  Reducing the Computational Cost of Computing Approximated Median Strings , 2002, SSPR/SPR.

[15]  François Nicolas,et al.  Hardness results for the center and median string problems under the weighted and unweighted edit distances , 2005, J. Discrete Algorithms.

[16]  J. Kruskal An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules , 1983 .

[17]  János Csirik,et al.  MEDIAN STRINGS: A REVIEW , 2004 .

[18]  Juan Ramón Rico-Juan,et al.  Comparison of AESA and LAESA search algorithms using string and tree-edit-distances , 2003, Pattern Recognit. Lett..

[19]  Ferenc Kruzslicz Improved Greedy Algorithm for Computing Approximate Median Strings , 1999, Acta Cybern..

[20]  S. Henikoff,et al.  Blocks database and its applications. , 1996, Methods in enzymology.

[21]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[22]  Juan Ramón Rico-Juan,et al.  New rank methods for reducing the size of the training set using the nearest neighbor rule , 2012, Pattern Recognit. Lett..

[23]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Xiaoyi Jiang,et al.  Evolutionary Weighted Mean Based Framework for Generalized Median Computation with Application to Strings , 2012, SSPR/SPR.

[25]  José Oncina,et al.  A Stochastic Approach to Median String Computation , 2008, SSPR/SPR.

[26]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .