Statistical study on a literary Romanian corpus for the beginning and ending of the words

The paper attempts to investigate the statistical structure of letters and of letter digrams with which the words begin and end, as well as of trigrams that link two successive words. The investigation is carried out on a printed Romanian literary corpus summing up about 12.5 million words. The impact of the orthography and punctuation marks in the language model assigned to the beginning and to the ending of words is considered.