Searching for historical word-forms in a database of 17th-century English text using spelling-correction methods

This paper discusses the application of algorithmic spelling-correction techniques to the identification of those words in a database of 17th century English text that are most similar to a query word in modern English. The experiments have used n-gram matching, non-phonetic coding and dynamic programming methods for spelling correction, and have demonstrated that high-recall searches can be carried out, although some of the searches are very demanding of computational resources. The methods are, in principle, applicable to historical texts in many languages and from many diffeent periods.

[1]  T. N. Gadd,et al.  `Fisching fore weds': phonetic retrieval of written text in information systems , 1988 .

[2]  R. C. Alston,et al.  The eighteenth century short title catalogue : the British Library collections , 1983 .

[3]  Peter Willett,et al.  Searching for Historical Word Forms in Text Databases using Spelling-Correction Methods: Reverse error and phonetic coding Methods , 1991, J. Documentation.

[4]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[5]  David M. Shaw MARC catalogues of early‐printed books at the University of Kent , 1991 .

[6]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[7]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[8]  Leon Davidson,et al.  Retrieval of misspelled names in an airlines passenger record system , 1962, CACM.

[9]  T. N. Gadd,et al.  PHOENIX: the algorithm , 1990 .

[10]  A. Burgess Language Made Plain , 1964 .

[11]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[12]  Antonio Zamora,et al.  Automatic spelling correction in scientific and scholarly text , 1984, CACM.

[13]  Michael Leslie The hartlib papers project: text retrieval with large datasets , 1990 .

[14]  Peter Willett,et al.  Automatic Spelling Correction Using a Trigram Similarity Measure , 1983, Inf. Process. Manag..