A Method for the Correction of Garbled Words Based on the Levenshtein Metric

In this paper we propose a new method for correcting garbled words based on Levenshtein distance and weighted Levenshtein distance. We can correct not only substitution errors, but also insertion errors and deletion errors by this method. According to the results of simulation on nearly 1000 high occurrence English words, higher error correcting rates can be achieved by this method than any other method tried to date. Hardware realization of the method is possible, though it is rather complicated.

[1]  Godfrey Dewey,et al.  Relativ frequency of English speech sounds , 1923 .

[2]  Charles R. Blair,et al.  A Program for Correcting Spelling Errors , 1960, Inf. Control..

[3]  Charles M. Vossler,et al.  The use of context for correcting garbled English text , 1964, ACM National Conference.

[4]  Constance K. McElwain,et al.  The Degarbler-A Program for Correcting Machine-Read Morse Code , 1962, Inf. Control..

[5]  Allen R. Hanson,et al.  A Contextual Postprocessing System for Error Correction Using Binary n-Grams , 1974, IEEE Transactions on Computers.

[6]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[7]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Stuart E. Dreyfus,et al.  Applied Dynamic Programming , 1965 .

[9]  Harvey Fletcher,et al.  Articulation Testing Methods , 1930 .

[10]  EDWARD M. RISEMAN,et al.  Contextual Word Recognition Using Binary Digrams , 1971, IEEE Transactions on Computers.

[11]  Gary Carlson,et al.  Techniques for replacing characters that are garbled on input , 1899, AFIPS '66 (Spring).

[12]  Eiichi Tanaka,et al.  Synchronization and substitution error-correcting codes for the Levenshtein metric , 1976, IEEE Trans. Inf. Theory.

[13]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.