N-gram models for language detection

In this document we report a set of experiments using n-gram language models for automatic language detection of text. We will start with a brief explanation of the concepts and of the mathematics behind n-gram language models and discuss some applications and domains in which they are widely used. We will also present an overview of related work in language detection. Then, we will describe the resources used in the experiments, namely a subset of the Europarl corpus and the SRILM toolkit. We will then perform a toy experiment in order to explain in detail our methodology. Afterwards, we will evaluate the performance of different language models and parameters through a precision measure based on the perplexity of a text according to a model. We conclude that n-gram models are indeed a simple and efficient tool for automatic