A multilingual corpus toolkit