Linear-Space Computation of the Edit-Distance between a String and a Finite Automaton

The problem of computing the edit-distance between a string and a finite automaton arises in a variety of applications in computational biology, text processing, and speech recognition. This paper presents linear-space algorithms for computing the edit-distance between a string and an arbitrary weighted automaton over the tropical semiring, or an unambiguous weighted automaton over an arbitrary semiring. It also gives an efficient linear-space algorithm for finding an optimal alignment of a string and such a weighted automaton.

[1]  Arto Salomaa,et al.  Automata-Theoretic Aspects of Formal Power Series , 1978, Texts and Monographs in Computer Science.

[2]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[3]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[4]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[5]  Wolfgang Thomas,et al.  Handbook of Theoretical Computer Science, Volume B: Formal Models and Semantics , 1990 .

[6]  Joel I. Seiferas,et al.  Correcting Counter-Automaton-Recognizable Languages , 1978, SIAM J. Comput..

[7]  Wojciech Rytter,et al.  Jewels of stringology , 2002 .

[8]  Mehryar Mohri Edit-Distance Of Weighted Automata: General Definitions And Algorithms , 2003, Int. J. Found. Comput. Sci..

[9]  Bell Telephone,et al.  Regular Expression Search Algorithm , 1968 .

[10]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[11]  Ken Thompson,et al.  Programming Techniques: Regular expression search algorithm , 1968, Commun. ACM.

[12]  Cyril Allauzen,et al.  3-Way Composition of Weighted Finite-State Transducers , 2008, CIAA.

[13]  Fernando Pereira,et al.  Weighted Automata in Text and Speech Processing , 2005, ArXiv.

[14]  Arto Salomaa,et al.  Semirings, Automata, Languages , 1985, EATCS Monographs on Theoretical Computer Science.

[15]  Dominique Perrin,et al.  Finite Automata , 1958, Philosophy.

[16]  Robert A. Wagner,et al.  Order-n correction for regular languages , 1974, CACM.

[17]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[18]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[19]  Yves Schabes,et al.  Speech Recognition by Composition of Weighted Finite Automata , 1997 .

[20]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[21]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[22]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[23]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[24]  E. Myers,et al.  Approximate matching of regular expressions. , 1989, Bulletin of mathematical biology.

[25]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[26]  Mehryar Mohri,et al.  Semiring Frameworks and Algorithms for Shortest-Distance Problems , 2002, J. Autom. Lang. Comb..