General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes