论文信息 - Computer detection of typographical errors

Computer detection of typographical errors

Describes a computer program written for the UNIX time-sharing system which reduces by several orders of magnitude the task of finding words in a document which contain typographical errors. The program is adaptive in the sense that it uses statistics from the document itself for its analysis. In a first pass through the document, a table of diagram and trigram frequencies is prepared. The second pass through the document breaks out individual words and compares the diagrams and trigrams in each word with the frequencies from the table. An index is given to each word which reflects the hypothesis that the trigrams in the given word were produced from the same source that produced the trigram table. The words are sorted in decreasing order of their indices and printed. Printing is suppressed for words appearing in a table of 2726 common technical English words.

[1] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[2] H. Kucera,et al. Computational analysis of present-day American English , 1967 .

[3] L. Stein,et al. Probability and the Weighing of Evidence , 1950 .