A statistical method for the identification and aggregation of regional linguistic variation

This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.

[1]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[2]  Edgar W. Schneider,et al.  Investigating Variation and Change in Written Documents , 2008 .

[3]  Willem Meijs,et al.  Book Reviews: Theory and Practice in Corpus Linguistics , 1991, CL.

[4]  D. Biber A typology of English texts , 1989 .

[5]  Natalie Schilling-Estes,et al.  American English: Dialects and Variation , 1998 .

[6]  J. Chambers,et al.  Dialectology: MECHANISMS OF VARIATION , 1998 .

[7]  Harold B. Allen,et al.  Readings in American dialectology , 1958 .

[8]  W. Labov The social stratification of English in New York City , 1969 .

[9]  Hans Goebl,et al.  Dialektometrie: Prinzipien und Methoden des Einsatzes der Numerischen Taxonomie im Bereich der Dialektgeographie , 1984 .

[10]  Harold B. Allen,et al.  The linguistic atlas of the Upper Midwest , 1979 .

[11]  R. Sinnott Virtues of the Haversine , 1984 .

[12]  W. Wolfram,et al.  A sociolinguistic description of Detroit Negro speech , 1970 .

[13]  Benedikt Szmrecsanyi,et al.  Corpus-based Dialectometry: Aggregate Morphosyntactic Variability in British English Dialects , 2008, Int. J. Humanit. Arts Comput..

[14]  William A. Kretszchmar Quantitative areal analysis of dialect features , 1996, Language Variation and Change.

[15]  John Nerbonne,et al.  Language and Space: Theories and Methods , 2009 .

[16]  Jack Grieve,et al.  A corpus-based regional dialect survey of grammatical variation in written standard American English , 2009 .

[17]  John Nerbonne,et al.  Hierarchical Spectral Partitioning of Bipartite Graphs to Cluster Dialects and Identify Distinguishing Features , 2010, TextGraphs@ACL.

[18]  J. Keith Ord,et al.  Spatial Processes Models and Applications , 1981 .

[19]  Jay Lee,et al.  Spatial Analysis of Linguistic Data with GIS Functions , 1993, Int. J. Geogr. Inf. Sci..

[20]  J. Chambers,et al.  The handbook of language variation and change , 2003 .

[21]  Hans Goebl On the Geolinguistic Change in Northern France between 1300 and 1900: A Dialectometrical Inquiry , 2007, SIGMORPHON.

[22]  John Nerbonne,et al.  Lexical Distance in LAMSAS , 2003, Comput. Humanit..

[23]  Jonas Rumpf,et al.  Structural analysis of dialect maps using methods from spatial statistics , 2009, Zeitschrift für Dialektologie und Linguistik.

[24]  Hans Kurath,et al.  A word geography of the eastern United States , 1949 .

[25]  Alan R. Thomas,et al.  Methods In Dialectology , 1999 .

[27]  J. Ord,et al.  Local Spatial Autocorrelation Statistics: Distributional Issues and an Application , 2010 .

[28]  J. Dufrénoy La relation entre la distance spatiale et la distance lexicale , 1972 .

[29]  Dennis R. Preston,et al.  American dialect research , 1993 .

[30]  Marc J. Perry,et al.  State-to-State Migration Flows: 1995 to 2000 , 2003 .

[31]  William Labov,et al.  The atlas of North American English : phonetics, phonology and sound change : a multimedia reference tool , 2006 .

[32]  B. Tabachnick,et al.  Using Multivariate Statistics , 1983 .

[33]  John Nerbonne,et al.  Identifying Linguistic Structure in Aggregate Comparison , 2006, Lit. Linguistic Comput..

[34]  Dirk Geeraerts,et al.  The structure of lexical variation: Meaning, naming, and context , 1994 .

[35]  A. H. Marckwardt Principal and Subsidiary Dialect Areas in the North-Central States , 1957 .

[36]  William A. Kretzschmar,et al.  Progress in Dialectometry: Toward Explanation , 2006, Lit. Linguistic Comput..

[37]  William A. Kretzschmar,et al.  Introducing Computational Methods in Dialectometry , 2003 .

[38]  Lawrence M. Davis,et al.  Is There a Midland Dialect Area?--Again , 1992 .

[39]  R. Shackleton,et al.  English-American Speech Relationships , 2005 .

[40]  Isoglosses and Predictive Modeling , 1992 .

[41]  Bernd Kortmann,et al.  A comparative grammar of British English dialects : agreement, gender, relative clauses , 2005 .

[42]  P. Moran The Interpretation of Statistical Maps , 1948 .

[43]  Dirk Speelman,et al.  Profile-Based Linguistic Uniformity as a Generic Method for Comparing Language Varieties , 2003, Comput. Humanit..

[44]  Volker Schmidt,et al.  Quantification and Statistical Analysis of Structural Similarities in Dialectological Area-Class Maps , 2010 .

[45]  Hans Goebl,et al.  Recent Advances in Salzburg Dialectometry , 2006, Lit. Linguistic Comput..

[46]  John Nerbonne,et al.  Recognising Groups among Dialects , 2008, Int. J. Humanit. Arts Comput..

[47]  J. Séguy,et al.  Atlas linguistique et ethnographique de la Gascogne , 1954 .

[48]  W. Zelinsky The cultural geography of the United States , 1973 .

[49]  William A. Kretzschmar,et al.  Introducing Computational Techniques in Dialectometry , 2003, Comput. Humanit..

[50]  Wilbert Jan Heeringa Measuring dialect pronunciation differences using Levenshtein distance , 2004 .

[51]  John Nerbonne,et al.  Toward a dialectological yardstick* , 2007, J. Quant. Linguistics.

[52]  J. Hair Multivariate data analysis , 1972 .

[53]  Hans Goebl,et al.  Dialektometrische Studien. anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF. 3 Bände , 1984 .

[54]  Walt Wolfram,et al.  American English: Dialects and Variation , 1998 .

[55]  Walt Wolfram,et al.  The Linguistic Variable: Fact and Fantasy , 1991 .

[56]  B. Tabachnick,et al.  Using multivariate statistics, 5th ed. , 2007 .

[57]  Timothy C. Frazer,et al.  Linguistic atlas of the Gulf States , 1993 .

[58]  William Labov,et al.  THE LINGUISTIC VARIABLE AS A STRUCTURAL UNIT. , 1966 .