Degrees of semantic control in measuring aggregated lexical distances

The goal of the current study is to show how aggregated lexical variation can be studied by means of corpus-based techniques, which differ in their amount of semantic control. In the current variationist field, one finds many studies of phonological or morphological variation on the basis of corpora1. Remarkably at first sight, though, studies of lexical variation in corpora are rare, especially in comparison with dialectology, where the study of lexical variation is part of the main research goal. In contrast to the other corpus-based variationist studies, however, the dialectological account of lexical variation is very much restricted to elicited data, as stored in well-known dialect atlases. Therefore, the current study sets out to show how this void of corpus-based studies of lexical variation can be filled, while taking into account possible issues with lexical semantic complexity. In the introduction to the paper, we would like to point out two things. First, we will explain why there is a plethora of studies on phonological and morphological variation and a scarcity of studies on lexical variation. Second, we will shed a different light on what can be understood under lexical variation from a corpus-linguistic point of view.

[1]  Dirk Speelman,et al.  A statistical method for the identification and aggregation of regional linguistic variation , 2011, Language Variation and Change.

[2]  Dirk Geeraerts,et al.  Theories of Lexical Semantics , 2010 .

[3]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[4]  Justyna Robinson 'Awesome' insights into semantic variation , 2010 .

[5]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[6]  D. Geeraerts,et al.  Measuring and parameterizing lexical convergence and divergence between European and Brazilian Portuguese , 2010 .

[7]  Beatriz R. Lavandera Where Does the Sociolinguistic Variable Stop? Working Papers in Sociolinguistics, No. 40. , 1977 .

[8]  D. Geeraerts,et al.  Convergentie en divergentie in de Nederlandse woordenschat: een onderzoek naar kleding- en voetbaltermen , 1999 .

[9]  Yves Peirsman,et al.  The automatic identification of lexical variation between language varieties , 2010, Natural Language Engineering.

[10]  Dirk Speelman,et al.  Profile-Based Linguistic Uniformity as a Generic Method for Comparing Language Varieties , 2003, Comput. Humanit..

[11]  Lonneke van der Plas,et al.  Automatic Acquisition of Lexico-semantic Knowledge for QA , 2005, IJCNLP.

[12]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[13]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[14]  Gillian Sankoff,et al.  4. Above and Beyond Phonology in Variable Rules , 1980 .

[15]  W. Labov The social stratification of English in New York City , 1969 .

[16]  Gertjan van Noord,et al.  Alpino: Wide-coverage Computational Analysis of Dutch , 2000, CLIN.

[17]  D. Biber,et al.  Lexical bundles in university spoken and written registers , 2007 .

[18]  John R. Taylor,et al.  语言的范畴化:语言学理论中的类典型 = Linguistic categorization : prototypes in linguistic theory , 1989 .

[19]  Beatriz R. Lavandera Where does the sociolinguistic variable stop? , 1978, Language in Society.

[20]  William Labov,et al.  Some principles of linguistic methodology , 1972, Language in Society.

[21]  Eyal Sagi,et al.  Semantic Density Analysis: Comparing Word Meaning across Time and Phonetic Space , 2009 .

[22]  Charles James Nice Bailey,et al.  New ways of analyzing variation in English , 1973 .

[23]  Graeme Hirst,et al.  Near-Synonymy and Lexical Choice , 2002, CL.

[24]  William Labov,et al.  Where Does the Linguistic Variable Stop? A Response to Beatriz Lavandera. Working Papers in Sociolinguistics, No. 44. , 1978 .

[25]  Hans Goebl,et al.  Dialektometrie: Prinzipien und Methoden des Einsatzes der Numerischen Taxonomie im Bereich der Dialektgeographie , 1984 .

[26]  W. Lowe,et al.  The Direct Route: Mediated Priming in Semantic Space , 2000 .

[27]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .