How to do Linguistics with R: Data exploration and statistical analysis

This book provides a linguist with a statistical toolkit for exploration and analysis of linguistic data. It employs R, a free software environment for statistical computing, which is increasingly popular among linguists. How to do Linguistics with R: Data exploration and statistical analysis is unique in its scope, as it covers a wide range of classical and cutting-edge statistical methods, including different flavours of regression analysis and ANOVA, random forests and conditional inference trees, as well as specific linguistic approaches, among which are Behavioural Profiles, Vector Space Models and various measures of association between words and constructions. The statistical topics are presented comprehensively, but without too much technical detail, and illustrated with linguistic case studies that answer non-trivial research questions. The book also demonstrates how to visualize linguistic data with the help of attractive informative graphs, including the popular ggplot2 system and Google visualization tools.

[1]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[2]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[3]  J. Verhoeven,et al.  Speech Rate in a Pluricentric Language: A Comparison Between Dutch in Belgium and the Netherlands , 2004, Language and speech.

[4]  K. Fischer,et al.  Empirical cognitive semantics: Some thoughts , 2010 .

[5]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[6]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[7]  D. Geeraerts,et al.  The English genitive alternation in a cognitive sociolinguistics perspective , 2010 .

[8]  Carol R. Ember,et al.  Climate, Econiche, and Sexuality : Influences on Sonority in Language , 2007 .

[9]  Stefan Th. Gries,et al.  Correction to Stefan Th. Gries’ “Dispersions and adjusted frequencies in corpora”, International Journal of Corpus Linguistics , 2012 .

[10]  Hinrich Schütze,et al.  Asymmetry in corpus-derived and human word associations , 2011 .

[11]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[12]  Natalia Levshina Geographic variation of quite + ADJ in twenty national varieties of English: A pilot study , 2014 .

[13]  E. Rosch,et al.  Family resemblances: Studies in the internal structure of categories , 1975, Cognitive Psychology.

[14]  P. Kay,et al.  The linguistic significance of the meanings of basic color terms , 1978 .

[15]  Martin Hilpert,et al.  Dynamic visualizations of language change: Motion charts on the basis of bivariate and multivariate data from diachronic corpora , 2011 .

[16]  H. Diessel Frequency effects in language acquisition, language use, and diachronic change , 2007 .

[17]  S. Gries,et al.  Extending collostructional analysis: A corpus-based perspective on `alternations' , 2004 .

[18]  D. Speelman,et al.  Towards a 3D-grammar: Interaction of linguistic and extralinguistic factors in the use of Dutch causative constructions , 2013 .

[19]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[20]  Marc Brysbaert,et al.  The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words , 2011, Behavior Research Methods.

[21]  E. Rosch Cognitive Representations of Semantic Categories. , 1975 .

[22]  R. Harald Baayen,et al.  Models, forests, and trees of York English: Was/were variation as a case study for statistical practice , 2012, Language Variation and Change.

[23]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[24]  J. Bresnan,et al.  Dative and genitive variability in Late Modern English: Exploring cross-constructional variation and change , 2013 .

[25]  Susan M. Gass,et al.  The effects of captioning videos used for foreign language listening activities , 2010 .

[26]  S. Gries Dispersions and adjusted frequencies in corpora , 2008 .

[27]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[28]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[29]  B. Everitt,et al.  Cluster Analysis: Everitt/Cluster Analysis , 2011 .

[30]  Gerard J. Steen,et al.  A method for linguistic metaphor identification : from MIP to MIPVU , 2010 .

[31]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[32]  Rebecca Treiman,et al.  The English Lexicon Project , 2007, Behavior research methods.

[33]  Eleanor Rosch Heider,et al.  The structure of the color space in naming and memory for two languages , 1972 .

[34]  Patrick Hanks,et al.  Contextual dependency and lexical sets , 1996 .

[35]  Esa Itkonen,et al.  Qualitative vs. Quantitative Analysis in Linguistics , 1980 .

[36]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[37]  Dirk Geeraerts Idealist and empiricist tendencies in cognitive semantics , 1999 .

[38]  D. Everett Cultural Constraints on Grammar and Cognition in Pirahã , 2005, Current Anthropology.

[39]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[40]  Eve Sweetser From Etymology to Pragmatics: List of abbreviations , 1990 .

[41]  John M. Chambers,et al.  Software for data analysis , 2008 .

[42]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[43]  Stefanie Wulff,et al.  Brutal Brits and persuasive Americans: Variety-specifc meaning construction in theinto-causative , 2007 .

[44]  A. Senghas,et al.  Children Creating Language: How Nicaraguan Sign Language Acquired a Spatial Grammar , 2001, Psychological science.

[45]  Daniel Wiechmann On the computation of collostruction strength: Testing measures of association as expressions of lexical bias , 2008 .

[46]  L. Boroditsky Does Language Shape Thought?: Mandarin and English Speakers' Conceptions of Time , 2001, Cognitive Psychology.

[47]  A. Senghas,et al.  Children Creating Core Properties of Language: Evidence from an Emerging Sign Language in Nicaragua , 2004, Science.

[48]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[49]  Anatol Stefanowitsch,et al.  Corpora in cognitive linguistics : corpus-based approaches to syntax and lexis , 2006 .

[50]  Natalia Levshina,et al.  Changing the world vs. changing the mind: Distinctive collexeme analysis of the causative construction with 'doen' in Belgian and Netherlandic Dutch , 2011 .

[51]  Eve Sweetser From Etymology to Pragmatics: Subject index , 1990 .

[52]  D. Geeraerts,et al.  Advances in cognitive sociolinguistics , 2010 .

[53]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[54]  J. Bresnan,et al.  Gradient Grammar: An Effect of Animacy on the Syntax of give in New Zealand and American English , 2008 .

[55]  Stefan Th. Gries,et al.  Ways of trying in Russian: clustering behavioral profiles , 2006, Corpus Linguistics and Linguistic Theory.

[56]  Rafael E. Núñez,et al.  With the Future Behind Them: Convergent Evidence From Aymara Language and Gesture in the Crosslinguistic Comparison of Spatial Construals of Time , 2006, Cogn. Sci..

[57]  A. Paivio,et al.  Concreteness, imagery, and meaningfulness values for 925 nouns. , 1968, Journal of experimental psychology.

[58]  Hans-Jörg Schmid,et al.  English abstract nouns as conceptual shells : from corpus to cognition , 2000 .

[59]  Nick C. Ellis,et al.  Constructions and their acquisition: Islands and the distinctiveness of their occupancy , 2009 .

[60]  Stefan Th. Gries,et al.  Collostructions: Investigating the interaction of words and constructions , 2003 .

[61]  Elizabeth Bates,et al.  On the inseparability of grammar and the lexicon: Evidence from acquisition. , 1997 .

[62]  A. Verhagen,et al.  Interaction and causation : Causative constructions in modern standard Dutch , 1997 .

[63]  A. Agresti Categorical data analysis , 1993 .

[64]  S. Gries,et al.  Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions , 2005 .

[65]  L. Allan A note on measurement of contingency between two binary variables in judgment tasks , 1980 .

[66]  N. Ellis Language Acquisition as Rational Contingency Learning , 2006 .

[67]  Luc Steels,et al.  Experiments in cultural language evolution , 2012 .

[68]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[69]  M. E. Johnson,et al.  A Comparative Study of Tests for Homogeneity of Variances, with Applications to the Outer Continental Shelf Bidding Data , 1981 .