Stable Lexical Marker Analysis: a corpus-based identification of lexical variation

Research questions that deal with mutual intelligibility and that investigate language attitudes in pluricentric languages rely on a correct assessment of the loci of divergence, differences in word choice being one of the most salient. Quantitative corpus-based methods can aid researchers to identify this lexical variation. This paper will focus on the language-independent method of Stable Lexical Marker Analysis (SLMA, Speelman et al. 2008) to find variety-specific words in representative corpora. The method is based on the keyword-analysis approach (Scott, 1997) but allows a graded rather than a categorical assessment of markedness and includes a mechanism to circumvent topical bias in corpora. The paper discusses further improvements to SLMA in order to deal with gradedness and offers a quantitative and qualitative analysis of results from a case study on the identification of lexical markers for Netherlandic and Belgian Dutch.