Weighted Automata Computation of Edit Distances with Consolidations and Fragmentations

We study edit distances between strings, based of based on operations of character substitutions, insertions, deletions and additionally consolidations and fragmentations. The two latter operations transform a sequence of characters into one character and vice-versa. They correspond to the compression and expansion in Dynamic Time-Warping algorithms for speech recognition and are also used for the formal analysis of written music. Such edit distances are not computable in general. We propose weighted automaton constructions to compute in polynomial time an edit distance taking into account both consolidations and deletions, or both fragmentations and insertions. We finally show that the optimal weight of sequences made of consolidations chained with fragmenta-tions, in that order, is computable in polynomial time, and not computable if we reverse the order of fragmentations and consolidations.

[1]  David Sankoff,et al.  Comparison of musical sequences , 1990, Comput. Humanit..

[2]  Stephen Dolan,et al.  Fun with semirings: a functional pearl on the abuse of linear algebra , 2013, ICFP.

[3]  Mehryar Mohri,et al.  Generic -removal Algorithm for Weighted Automata , 2007 .

[4]  Dieter Hofbauer,et al.  Deleting string rewriting systems preserve regularity , 2004, Theor. Comput. Sci..

[5]  Joseph B. Kruskall,et al.  The Symmetric Time-Warping Problem : From Continuous to Discrete , 1983 .

[6]  Friedrich Otto,et al.  String-Rewriting Systems , 1993, Text and Monographs in Computer Science.

[7]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[8]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  R. Backhouse,et al.  Regular Algebra Applied to Path-finding Problems , 1975 .

[10]  Liang Huang,et al.  Advanced Dynamic Programming in Semiring and Hypergraph Frameworks , 2008, COLING.

[11]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[12]  M. Droste,et al.  Semirings and Formal Power Series , 2009 .

[13]  Jacques Sakarovitch,et al.  The Removal of Weighted ε-Transitions , 2012, CIAA.

[14]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[15]  Mehryar Mohri Edit-Distance Of Weighted Automata: General Definitions And Algorithms , 2003, Int. J. Found. Comput. Sci..

[16]  Mehryar Mohri,et al.  Semiring Frameworks and Algorithms for Shortest-Distance Problems , 2002, J. Autom. Lang. Comb..