A new iterative algorithm for computing a quality approximate median of strings based on edit operations

This paper presents a new algorithm that can be used to compute an approximation to the median of a set of strings. The approximate median is obtained through the successive improvements of a partial solution. The edit distance from the partial solution to all the strings in the set is computed in each iteration, thus accounting for the frequency of each of the edit operations in all the positions of the approximate median. A goodness index for edit operations is later computed by multiplying their frequency by the cost. Each operation is tested, starting from that with the highest index, in order to verify whether applying it to the partial solution leads to an improvement. If successful, a new iteration begins from the new approximate median. The algorithm finishes when all the operations have been examined without a better solution being found. Comparative experiments involving Freeman chain codes encoding 2D shapes and the Copenhagen chromosome database show that the quality of the approximate median string is similar to benchmark approaches but achieves a much faster convergence.

[1]  José Oncina,et al.  A Stochastic Approach to Median String Computation , 2008, SSPR/SPR.

[2]  Xiaoyi Jiang,et al.  Generalized median string computation by means of string embedding in vector spaces , 2012, Pattern Recognit. Lett..

[3]  Teuvo Kohonen,et al.  Median strings , 1985, Pattern Recognit. Lett..

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Francisco Casacuberta,et al.  Improving classification using median string and NN rules , 2001 .

[6]  Francisco Casacuberta,et al.  Median strings for k-nearest neighbour classification , 2003, Pattern Recognit. Lett..

[7]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[8]  János Csirik,et al.  MEDIAN STRINGS: A REVIEW , 2004 .

[9]  Ana L. N. Fred,et al.  Ensemble Methods in the Clustering of String Patterns , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[10]  Francisco Casacuberta,et al.  Reducing the Computational Cost of Computing Approximated Median Strings , 2002, SSPR/SPR.

[11]  J. Kruskal An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules , 1983 .

[12]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Juan Ramón Rico-Juan,et al.  New rank methods for reducing the size of the training set using the nearest neighbor rule , 2012, Pattern Recognit. Lett..

[14]  Panu Somervuo,et al.  Self-organizing maps of symbol strings , 1998, Neurocomputing.

[15]  Herbert Freeman,et al.  Computer Processing of Line-Drawing Images , 1974, CSUR.

[16]  Anil K. Jain,et al.  Representation and Recognition of Handwritten Digits Using Deformable Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Carlos David Martínez Hinarejos La cadena media y su aplicación en reconocimiento de formas , 2003 .

[18]  Francisco Casacuberta,et al.  On the Use of Median String for Multi-source Translation , 2010, 2010 20th International Conference on Pattern Recognition.

[19]  Juan Ramón Rico-Juan,et al.  Some Results about the Use of Tree/String Edit Distances in a~Nearest Neighbour Classification Task , 2003, IbPRIA.

[20]  François Nicolas,et al.  Hardness results for the center and median string problems under the weighted and unweighted edit distances , 2005, J. Discrete Algorithms.

[21]  Abraham Kandel,et al.  On the Weighted Mean of a Pair of Strings , 2002, Pattern Analysis & Applications.

[22]  R.A.M. Cardenas A learning model for multiple-prototype classification of strings , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[23]  Igor Fischer,et al.  String Averages and Self-Organizing Maps for Strings , 2000 .

[24]  Ferenc Kruzslicz Improved Greedy Algorithm for Computing Approximate Median Strings , 1999, Acta Cybern..

[25]  François Fouss,et al.  A sum-over-paths extension of edit distances accounting for all sequence alignments , 2011, Pattern Recognit..

[26]  Horst Bunke,et al.  Optimal Lower Bound for Generalized Median Problems in Metric Space , 2002, SSPR/SPR.