SYMBOLIC CHANNEL MODELLING FOR NOISY CHANNELS WHICH PERMIT ARBITRARY NOISE DISTRIBUTIONS

In this paper we present a new model for noisy channels which permit arbitrarily distributed substitution, deletion and insertion errors. Apart from its straightforward applications in string generation and recognition, the model also has potential applications in speech and unidimensional signal processing. The model is specified in terms of a noisy string generation technique. Let A be any finite alphabet and A* be the set of words over A. Given any arbitrary string U Œ A*, we specify a stochastically consistent scheme by which this word can be transformed into any Y Œ A*. This is achieved by specifying the process by which U is transformed by performing substitution, deletion and insertion operations. The scheme is shown to be Functionally Complete and stochastically consistent. The probability distributions for these respective operations can be completely arbitrary. Apart from presenting the channel in which all the possible strings in A* can be potentially generated, we also specify a technique by which Pr[Y|U], the probability of receiving Y given that U was transmitted, can be computed in cubic time. This procedure involves dynamic programming, and is to our knowledge, among the few non-trivial applications of dynamic programming which evaluate quantities involving relatively complex combinatorial expressions and which simultaneously maintain rigid probability consistency constraints.

[1]  David L. Neuhoff,et al.  The Viterbi algorithm as an aid in text recognition (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[2]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[3]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[4]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[5]  Lalit R. Bahl,et al.  Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition , 1975, IEEE Trans. Inf. Theory.

[6]  Philippe Jacquet,et al.  Analysis of digital tries with Markovian dependency , 1991, IEEE Trans. Inf. Theory.

[7]  B. John Oommen Recognition of Noisy Subsequences Using Constrained Edit Distances , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[9]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[10]  Wojciech Szpankowski,et al.  Probabilistic Analysis of Generalized Suffix Trees (Extended Abstract) , 1992, CPM.

[11]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[12]  Wojciech Szpankowski,et al.  A Note on the Height of Suffix Trees , 1992, SIAM J. Comput..

[13]  Godfrey Dewey,et al.  Relativ frequency of English speech sounds , 1923 .