50-something years of work on collocations: What is or should be next …

This paper explores ways in which research into collocation should be improved. After a discussion of the parameters underlying the notion of collocation, the paper has three main parts. First, I argue that corpus linguistics would benefit from taking more seriously the understudied fact that collocations are not necessarily symmetric, as most association measures imply. Also, I introduce an association measure from the associative learning literature that can identify asymmetric collocations and show that it can also distinguish collocations with high and low association strengths well. Second, I summarize some advantages of this measure and brainstorm about ways in which it can help re-examine previous studies as well as support further applications. Finally, I adopt a broader perspective and discuss a variety of ways in which all association measures – directional or not – in corpus linguistics should be improved in order for us to obtain better and more reliable results.

[1]  Stefan Evert,et al.  Corpora and collocations , 2007 .

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  Michael Stubbs,et al.  COLLOCATIONS AND SEMANTIC PROFILES: ON THE CAUSE OF THE TROUBLE WITH QUANTITATIVE STUDIES , 1995 .

[4]  Stefan Th. Gries,et al.  Corpus linguistics, theoretical linguistics, and cognitive/psycholinguistics: Towards more and more fruitful exchanges , 2012 .

[5]  Nick C. Ellis,et al.  Constructions and their acquisition: Islands and the distinctiveness of their occupancy , 2009 .

[6]  A. Tversky Features of Similarity , 1977 .

[7]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[8]  Stefan Th. Gries,et al.  Testing the sub-test: an analysis of English -ic and -ical adjectives , 2003 .

[9]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[10]  S. Gries 1. Phraseology and linguistic theory: A brief survey , 2008 .

[11]  Susanne Handl,et al.  Essential collocations for learners of English: The role of collocational direction and weight , 2008 .

[12]  H. Adelsberger,et al.  Author’s Address: , 2005 .

[13]  S. Gries Dispersions and adjusted frequencies in corpora: further explorations , 2010 .

[14]  Frederick Jelinek,et al.  Self-organizing language modeling for speech recognition , 1990 .

[15]  William D. Raymond,et al.  Are effects of word frequency effects of context of use? An analysis of initial fricative reduction in Spanish , 2012 .

[16]  Hinrich Schütze,et al.  Asymmetry in corpus-derived and human word associations , 2011 .

[17]  R. Baayen,et al.  Demythologizing the word frequency effect: A discriminative learning perspective , 2010 .

[18]  Dagmar Divjak,et al.  Frequency effects in language learning and processing , 2012 .

[19]  A. Kilgarriff Simple Maths for Keywords , 2009 .

[20]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[21]  Stefan Evert,et al.  58. Corpora and collocations , 2009 .

[22]  N. Ellis Language Acquisition as Rational Contingency Learning , 2006 .

[23]  Tu Bao Ho,et al.  Improving effectiveness of mutual information for substantival multiword expression extraction , 2009, Expert Syst. Appl..

[24]  Stefanie Wulff,et al.  Corpus-linguistic applications : current studies, new directions , 2010 .

[25]  Sabine Bartsch Structural and functional properties of collocations in English : a corpus study of lexical and pragmatic constraints on lexical co-occurrence , 2004 .

[26]  Dawn Nordquist,et al.  Investigating elicited data from a usage-based perspective , 2009 .

[27]  Jason M. Brenier,et al.  Predictability Effects on Durations of Content and Function Words in Conversational English , 2009 .

[28]  Stefan Th. Gries,et al.  Bigrams in registers, domains, and varieties: a bigram gravity approach to the homogeneity of corpora , 2009 .

[29]  Pavel Pecina,et al.  Lexical association measures and collocation extraction , 2009, Lang. Resour. Evaluation.

[30]  Iain McGee,et al.  Adjective-noun collocations in elicited and corpus data: Similarities, differences, and the whys and wherefores , 2009 .

[31]  Daniel Wiechmann On the computation of collostruction strength: Testing measures of association as expressions of lexical bias , 2008 .

[32]  Michael Stubbs,et al.  Words and Phrases: Corpus Studies of Lexical Semantics , 2001 .

[33]  Ted Pedersen Dependent Bigram Identification , 1998, AAAI/IAAI.

[34]  S. Gries Dispersions and adjusted frequencies in corpora , 2008 .

[35]  Joybrato Mukherjee,et al.  Corpus linguistics and variation in English : theory and description , 2012 .

[36]  S. Gries,et al.  Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions , 2005 .

[37]  Vidas Daudaravicius,et al.  Gravity Counts for the boundaries of collocations , 2004 .

[38]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[39]  S. Gries Phraseology and linguistic theory : a brief survey , 2007 .

[40]  Sandra Mollin,et al.  Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations , 2009 .

[41]  Sylviane Granger,et al.  Phraseology: An Interdisciplinary Perspective , 2008 .

[42]  David R. Shanks,et al.  The Psychology of Associative Learning , 1995 .