Natural Language Identification using Corpus-Based Models

This paper describes three approaches to the task of automatically identifying the language a text is written in. We conducted experiments to compare the success of each approach in identifying languages from a set of texts in Dutch/Friesian, English, French, Gaelic (Irish), German, Italian, Portuguese, Serbo-Croat and Spanish.....