On the statistics of spoken English

The importance of using linguistic information for automatic speech recognition and speech synthesis is becoming more and more apparent as our knowledge of speech increases. Among the different kinds of linguistic rules that could be used in this connection are the statistical relationships between the various units of the language. This paper describes the statistical analysis of a considerable body of English speech, using a digital computer. The principles for selecting suitable texts are considered. The speech material used for the present study was obtained from several “Phonetic Readers” compiled for the teaching of English; these phonetic transcriptions were punched on IBM cards to make them suitable for computer processing. Counts were obtained for phoneme monogram and digram frequencies, for word length in terms of phonemes and of syllables, etc. Stress was taken into consideration, and many of the statistics were obtained separately for stressed and unstressed cases. Particular attention has bee...