Estimation of probabilities for edit operations

In this work, as an alternative for initial estimation of edit costs, character confusion probabilities are discussed in the context of edit distances. Thereby, insertions have to be handled carefully and it is shown how improved estimations for them can be achieved. Furthermore, some of the proposed solutions based on joint events leading to inferior models for retrieving the word corresponding to the recognized string at hand from a given lexicon, are discussed.

[1]  B. John Oommen Recognition of Noisy Subsequences Using Constrained Edit Distances , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  János Csirik,et al.  Inference of edit costs using parametric string matching , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[3]  Rangasami L. Kashyap,et al.  Syntactic Decision Rules for Recognition of Spoken Words and Phrases Using a Stochastic Automaton , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Graham A Stephen,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[5]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Horst Bunke Recent Advances in String Matching , 1993 .

[7]  Marc Parizeau,et al.  Optimizing the cost matrix for approximate string matching using genetic algorithms , 1998, Pattern Recognit..