LINGUA: The Language-Independent Neighbourhood Generator of the University of Alberta

LINGUA (Language-Independent Neighbourhood Generator of the University of Alberta) is a free, platform-independent (Java) program consisting of a set of tools that have been developed for three purposes: to turn corpora into frequency dictionaries; to calculate orthographic neighbourhood and N-gram counts; and to generate plausible nonwords in an algorithmic way. As its name suggests, it has been specifically developed to be language-independent, and is able to handle input corpora in a wide range of text encodings. In this article we describe the LINGUA tools and how to use them. Since LINGUA requires a large corpus, we also include a tutorial describing in detail how to develop a corpus in a specific language by harvesting text from the World Wide Web. LINGUA is freely available from: http://www.psych.ualberta.ca/~westburylab/