Approximate String Matching

Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword. The methods found are classified as either equivalence or similarity problems. Equivalence problems are seen to be readily solved using canonical forms. For sinuiarity problems difference measures are surveyed, with a full description of the wellestablmhed dynamic programming method relating this to the approach using probabilities and likelihoods. Searches for approximate matches in large sets using a difference function are seen to be an open problem still, though several promising ideas have been suggested. Approximate matching (error correction) during parsing is briefly reviewed.

[1]  Jeffrey D. Ullman,et al.  Error correction for formal languages , 1966 .

[2]  Alan M. Davis,et al.  The design and implementation of a table driven, interactive diagnostic programming system , 1976, CACM.

[3]  W. W. Peterson,et al.  Error-Correcting Codes. , 1962 .

[4]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[5]  A. J. Szanser Bracketing Technique in Elastic Matching , 1973, Comput. J..

[6]  C. D. Paice Information retrieval and the computer , 1977 .

[7]  Howard Lee Morgan,et al.  Spelling correction in systems programs , 1970, Commun. ACM.

[8]  Victor Y. Pan,et al.  Field extension and trilinear aggregating, uniting and canceling for the acceleration of matrix multiplications , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[9]  H. Sakoe,et al.  Two-level DP-matching--A dynamic programming-based pattern matching algorithm for connected word recognition , 1979 .

[10]  Gordon B. Davis,et al.  A study of errors, error-proneness, and error diagnosis in Cobol , 1976, CACM.

[11]  Patrick A. V. Hall,et al.  Design of information systems for Arabic , 1978, ECI.

[12]  Alfred V. Aho,et al.  A Minimum Distance Error-Correcting Parser for Context-Free Languages , 1972, SIAM J. Comput..

[13]  W. B. Smith,et al.  Error Detection in Formal Languages , 1970, J. Comput. Syst. Sci..

[14]  Marvin B. Shapiro The choice of reference points in best-match file searching , 1977, CACM.

[15]  Susan L. Graham,et al.  Practical syntactic error recovery , 1975, CACM.

[16]  Julian R. Ullmann,et al.  A Binary n-Gram Technique for Automatic Correction of Substitution, Deletion, Insertion and Reversal Errors in Words , 1977, Comput. J..

[17]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[18]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[19]  David A. Bell,et al.  Programmer Selection and Programming Errors , 1976, Comput. J..

[20]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[21]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[22]  Cyril N. Alberga,et al.  String similarity and misspellings , 1967, CACM.

[23]  Patrick A. V. Hall Branch-and-Bound and Beyond , 1971, IJCAI.

[24]  David N. Freeman Error correction in CORC: the Cornell Computing Language , 1964, AFIPS '64 (Fall, part I).

[25]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[26]  Paul Heckel,et al.  A technique for isolating differences between files , 1978, CACM.

[27]  Steven Skiena,et al.  The Algorithm Design Manual , 2020, Texts in Computer Science.

[28]  Ken D. Eason,et al.  Understanding the Naive Computer User , 1976, Comput. J..

[29]  Zvi Galil,et al.  On improving the worst case running time of the Boyer-Moore string matching algorithm , 1978, CACM.

[30]  Gordon Lyon,et al.  Syntax-directed least-errors analysis for context-free languages , 1974, Commun. ACM.

[31]  Ray Teitelbaum Minimal distance analysis of syntax errors in computer programs. , 1976 .

[32]  M. Dennis Mickunas,et al.  Automatic error recovery for LR parsers , 1978, CACM.

[33]  N. G. Zagoruyko,et al.  Automatic recognition of 200 words , 1970 .

[34]  S. J. Waters CAM02: A Structured Precedence Analyser , 1977, Comput. J..

[35]  Tadeusz Radecki,et al.  New approach to the problem of information system effectiveness evaluation , 1976, Inf. Process. Manag..

[36]  Alan L. Tharp,et al.  Correcting human error in alphanumeric terminal input , 1977, Inf. Process. Manag..

[37]  Charles P. Bourne,et al.  A Study of Methods for Systematically Abbreviating English Words and Names , 1961, JACM.

[38]  L. Shaffer,et al.  Typing Performance as a Function of Text , 1968 .

[39]  R. M. Cormack,et al.  A Review of Classification , 1971 .

[40]  Xerox Corpora,et al.  Speech Recognition Experiments with Linear Predication, Bandpass Filtering, and Dynamic Programming , 1975 .

[41]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[42]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[43]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[44]  Charles R. Blair,et al.  A Program for Correcting Spelling Errors , 1960, Inf. Control..

[45]  Witold Lipski,et al.  On semantic issues connected with incomplete information databases , 1979, ACM Trans. Database Syst..

[46]  Jeffrey D. Ullman,et al.  Formal languages and their relation to automata , 1969, Addison-Wesley series in computer science and information processing.

[47]  Victor R. Lesser,et al.  The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty , 1980, CSUR.

[48]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[49]  Robert E. Tarjan,et al.  Efficiency of a Good But Not Linear Set Union Algorithm , 1972, JACM.

[50]  K S Fu,et al.  ERROR-CORRECTING PARSING FOR SYNTACTIC PATTERN RECOGNITION. , 1977 .

[51]  Derek Partridge,et al.  Adaptive correction of program statements , 1973, Commun. ACM.

[52]  Allen R. Hanson,et al.  A Contextual Postprocessing System for Error Correction Using Binary n-Grams , 1974, IEEE Transactions on Computers.

[53]  Leon Davidson,et al.  Retrieval of misspelled names in an airlines passenger record system , 1962, CACM.

[54]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[55]  Charles P. Bourne,et al.  Frequency and impact of spelling errors in bibliographic data bases , 1977, Inf. Process. Manag..

[56]  Gerard Salton,et al.  Generation and search of clustered files , 1978, TODS.

[57]  David Gries,et al.  Compiler Construction for Digital Computers , 1971 .

[58]  Edgar T. Irons An error-correcting parse algorithm , 1963, CACM.

[59]  Robert A. Wagner,et al.  Order-n correction for regular languages , 1974, CACM.

[60]  A. J. Szanser Error-correcting methods in natural language processing , 1968, IFIP Congress.

[61]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[62]  Eiichi Tanaka,et al.  Error-Correcting Parsers for Formal Languages , 1978, IEEE Transactions on Computers.

[63]  Chak-Kuen Wong,et al.  Bounds for the String Editing Problem , 1976, JACM.

[64]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[65]  G. White,et al.  Speech recognition experiments with linear predication, bandpass filtering, and dynamic programming , 1976 .

[66]  Garrett Birkhoff,et al.  A survey of modern algebra , 1942 .