The predictability of letters in written english

We show that the predictability of letters in written English texts depends strongly on their position in the word. The first letters are usually the least easy to predict. This agrees with the intuitive notion that words are well defined subunits in written languages, with much weaker correlations across these units than within them. It implies that the average entropy of a letter deep inside a word is roughly 4–5 times smaller than the entropy of the first letter.

[1]  Meir Feder,et al.  A universal finite memory source , 1995, IEEE Trans. Inf. Theory.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[4]  Paolo Ferragina,et al.  Text Compression , 2009, Encyclopedia of Database Systems.

[5]  Thomas M. Cover,et al.  A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[6]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[7]  Peter Grassberger,et al.  Entropy estimation of symbol sequences. , 1996, Chaos.