Statistics for Aggregate Variationist Analyses

This chapter presents methods for the analysis of large aggregates and techniques that seek to identify groups together with their common speech habits. Jean Seguy examined the distribution of aggregate differences (in Gascogne) as a function of geography, displaying a sub‐linear curve, which John Nerbonne argues contradicts Peter Trudgill's gravity theory of dialect divergence. These early works motivate the aggregate perspective. Regression analyses have played a role in characterizing the relation between geographic distance and aggregate linguistic differences. Another approach, generalized additive modeling (GAM), is able to simultaneously detect the aggregate geographical pattern, whereas also identifying the importance of other relevant social and lexical predictors. The advantage of the GAM is that it allows one to directly incorporate the complex influence of geography on the aggregate patterns, whereas simultaneously considering the importance of other social and lexical variables.

[1]  Mirjam Ernestus,et al.  Variation in Dutch: From written MOGELIJK to spoken MOK , 2005 .

[2]  H. H. Clark The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. , 1973 .

[3]  John Nerbonne,et al.  Recognising Groups among Dialects , 2008, Int. J. Humanit. Arts Comput..

[4]  Wladyslaw Cichocki,et al.  Geographic Variation in Acadian French /r /: What Can Correspondence Analysis Contribute Toward Explanation? , 2006, Lit. Linguistic Comput..

[5]  John Nerbonne,et al.  Identifying Linguistic Structure in Aggregate Comparison , 2006, Lit. Linguistic Comput..

[6]  Martijn Simonetta John R. Harald Wieling,et al.  Lexical differences between Tuscan dialects and standard Italian: Accounting for geographic and sociodemographic variation using generalized additive mixed modeling , 2013 .

[7]  Charles Boberg The North American Regional Vocabulary Survey: New variables and methods in the study of north American English , 2005 .

[8]  R. Harald Baayen,et al.  Models, forests, and trees of York English: Was/were variation as a case study for statistical practice , 2012, Language Variation and Change.

[9]  Sheila Embleton,et al.  Multidimensional Scaling as a Dialectometrical Technique: Outline of a Research Project , 1993 .

[10]  R. Baayen,et al.  Quantitative Social Dialectology: Explaining Linguistic Variation Geographically and Socially , 2011, PloS one.

[11]  T. Jaeger,et al.  Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. , 2008, Journal of memory and language.

[12]  Antti Leino,et al.  Comparison of Component Models in Analysing the Distribution of Dialectal Features , 2008, Int. J. Humanit. Arts Comput..

[13]  Chong Ho Yu,et al.  Test Equating by Common Items and Common Subjects: Concepts and Applications , 2005 .

[14]  P. Trudgill Linguistic change and diffusion: description and explanation in sociolinguistic dialect geography , 1974, Language in Society.

[15]  J. Nerbonne Measuring the diffusion of linguistic change , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[16]  John C. Paolillo Individual effects in variation analysis: Model, software, and research design , 2013, Language Variation and Change.

[17]  Therese Leinonen Factor Analysis of Vowel Pronunciation in Swedish Dialects , 2008, Int. J. Humanit. Arts Comput..

[18]  Cynthia G. Clopper,et al.  North American English Vowels: A Factor-analytic Perspective , 2006, Lit. Linguistic Comput..

[19]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[20]  John Nerbonne,et al.  Bipartite spectral graph partitioning for clustering dialect varieties and detecting their linguistic features , 2011, Comput. Speech Lang..

[21]  William Labov,et al.  On the uses of variable rules , 1979, Language in Society.

[22]  John Nerbonne,et al.  An Aggregate Analysis of Pronunciation in the Goeman-Taeldeman-van Reenen-Project Data , 2007 .

[23]  Daniel Ezra Johnson,et al.  Getting off the GoldVarb Standard: Introducing Rbrul for Mixed-Effects Variable Rule Analysis , 2009, Lang. Linguistics Compass.

[24]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[25]  S. Wood Thin plate regression splines , 2003 .

[26]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[27]  W. Heeringa,et al.  Associations among linguistic levels , 2009 .

[28]  Mollie E. Brooks,et al.  Generalized linear mixed models: a practical guide for ecology and evolution. , 2009, Trends in ecology & evolution.

[29]  Dirk Speelman,et al.  A statistical method for the identification and aggregation of regional linguistic variation , 2011, Language Variation and Change.

[30]  John Nerbonne,et al.  Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering , 2007, GfKl.