Case-sensitive letter and bigram frequency counts from large-scale English corpora

We tabulated upper- and lowercase letter frequency using several large-scale English corpora (∼183 million words in total). The results indicate that the relative frequencies for upper- and lowercase letters are not equivalent. We report a letter-naming experiment in which uppercase frequency predicted response time to uppercase letters better than did lowercase frequency. Tables of case-sensitive letter and bigram frequency are provided, including common nonalphabetic characters. Because subjects are sensitive to frequency relationships among letters, we recommend that experimenters use case-sensitive counts when constructing stimuli from letters.

[1]  H. F. Gaines,et al.  Cryptanalysis: A Study of Ciphers and Their Solution , 1956 .

[2]  R. Nickerson Response Times for “Same”-“Different” Judgments , 1965, Perceptual and motor skills.

[3]  M. S. Mayzner,et al.  The letter-frequency effect and the generality of familiarity effects on perception , 1981, Perception & psychophysics.

[4]  C. Eriksen,et al.  Some characteristics of selective attention in visual perception determined by vocal reaction time , 1972 .

[5]  Robert W. Proctor,et al.  A unified theory for matching-task phenomena. , 1981 .

[6]  J. Bruner,et al.  Familiarity of Letter Sequences and Tachistoscopic Identification , 1954 .

[7]  M. Posner Chronometric explorations of mind , 1978 .

[8]  J. H. Steiger Tests for comparing elements of a correlation matrix. , 1980 .

[9]  Broadbent De Word-frequency effect and response bias. , 1967 .

[10]  D. O. Robinson,et al.  The role of bigram frequency in the perception of words and nonwords , 1975, Memory & cognition.

[11]  D. Broadbent Word-frequency effect and response bias. , 1967, Psychological review.

[12]  Roger L. Dominowski,et al.  Anagram solving as a function of bigram frequency , 1964 .

[13]  L. E. Krueger Familiarity effects in visual information processing. , 1975, Psychological bulletin.

[14]  M P Bryden Symmetry of letters as a factor in tachistosccopic recognition. , 1968, The American journal of psychology.

[15]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[16]  J. Townsend Theoretical analysis of an alphabetic confusion matrix , 1971 .

[17]  D. Broadbent,et al.  Visual perception of words differing in letter digram frequency , 1968 .

[18]  H. Bouma Visual recognition of isolated lower-case letters. , 1971, Vision research.

[19]  C R Latimer,et al.  Search Time as a Function of Context Letter Frequency , 1972, Perception.

[20]  G C Gilmore,et al.  Multidimensional letter similarity derived from recognition errors , 1979, Perception & psychophysics.

[21]  Miles A. Tinker,et al.  The Relative Legibility of the Letters, the Digits, and of Certain Mathematical Signs , 1928 .

[22]  A. V. D. Heijden,et al.  Anempirical interletter confusionmatrix for continuous-line capitals , 1984, Perception & psychophysics.

[23]  Howard E. Egeth,et al.  Differential effects of familiarity on judgments of sameness and difference , 1971 .

[24]  L. H. Geyer Recognition and confusion of the lowercase alphabet , 1977 .

[25]  M. S. Mayzner,et al.  Tables of single-letter and digram frequency counts for various word-length and letter-position combinations. , 1965 .

[26]  Bryden Mp Symmetry of letters as a factor in tachistosccopic recognition. , 1968 .

[27]  William K. Estes,et al.  Interactions of signal and background variables in visual processing , 1972 .

[28]  Michael J. Cosky,et al.  The role of letter recognition in word recognition , 1976, Memory & cognition.

[29]  Helen Fouché Gaines Elementary cryptanalysis : a study of ciphers and their solution , 1939 .

[30]  Francesca Peressotti,et al.  On recognizing proper names: The orthographic cue hypothesis , 2003, Cognitive Psychology.

[31]  L E Krueger,et al.  A theory of perceptual matching. , 1978, Psychological review.

[32]  L E Krueger Effect of irrelevant surrounding material on speed of same-different judgment of two adjacent letters. , 1973, Journal of experimental psychology.

[33]  M I Posner,et al.  Chronometric analysis of classification. , 1967, Psychological review.

[34]  Robert L. Solso,et al.  Frequency and versatility of letters in the English language , 1976 .

[35]  R. Solomon,et al.  Visual duration threshold as a function of word-probability. , 1951, Journal of experimental psychology.

[36]  Lester E. Krueger Effect of letter-pair frequency and orientation of speed of “same” - “different” judgments by children and adults , 1973 .

[37]  R. Conrad,et al.  Letter Structure of the English Language , 1960, Nature.

[38]  E. J. Williams The Comparison of Regression Variables , 1959 .

[39]  L E Krueger,et al.  Effect of bracketing lines on speed of "same"-"different" judgment of two adjacent letters. , 1970, Journal of Experimental Psychology.

[40]  Lester E. Kreuger Effect of irrelevant surrounding material on speed of same-different judgment of two adjacent letters. , 1973 .

[41]  Raymond S. Nickerson,et al.  Frequency, recency, and repetition effects on same and different response times. , 1973 .