Lexical gravity across varieties of English: An ICE-based study of n -grams in Asian Englishes

In our earlier work on three Asian Englishes and British English, we showed how lexico-syntactic co-occurrence preferences for three argument structure constructions revealed differences between varieties that correlated well with Schneider’s (2003, 2007) model of evolutionary stages. Here, we turn to lexical co-occurrence preferences and investigate if and to what degree n-grams distinguish between different modes and varieties in the same components of the International Corpus of English. Our approach to n-grams differs from previous work in that we neither use raw frequencies nor (problematic) MI-values but the newly proposed measure of lexical gravity (cf. Daudaravicius a Marcinkevicienė 2004), which takes type frequencies into consideration. We show how lexical gravity can be extended to handle n-grams with n ≥ 3 and apply this method to our n-gram data; in addition, we suggest a new concept for describing the tendency of a word to occur in significant n-grams: lexical stickiness.

[1]  N. Ellis,et al.  An Academic Formulas List: New Methods in Phraseology Research , 2010 .

[2]  Eytan Ruppin,et al.  Unsupervised learning of natural languages , 2006 .

[3]  E. Schneider Postcolonial English: Varieties around the World , 2007 .

[4]  Philip Shaw,et al.  Verb complementation patterns in Indian Standard English , 2003 .

[5]  R. Shillcock,et al.  Eye Movements Reveal the On-Line Computation of Lexical Probabilities During Reading , 2003, Psychological science.

[6]  Kenji Kita,et al.  A comparative study of automatic extraction of collocations from corpora: mutual information vs , 1994 .

[7]  Vidas Daudaravicius,et al.  Gravity Counts for the boundaries of collocations , 2004 .

[8]  G. Underwood,et al.  The eyes have it: An eye-movement study into the processing of formulaic sequences , 2004 .

[9]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[10]  Douglas Biber,et al.  A corpus-driven approach to formulaic language in English: multi-word patterns in speech and writing , 2009 .

[11]  Stefanie Wulff,et al.  Corpus-linguistic applications : current studies, new directions , 2010 .

[12]  N. Snider,et al.  More than words: Frequency effects for multi-word phrases , 2010 .

[13]  Tarek M. Sobh,et al.  USING GRAPHEME n-GRAMS IN SPELLING CORRECTION AND AUGMENTATIVE TYPING SYSTEMS , 2008 .

[14]  Braj B. Kachru The Indianization of English , 1986, English Today.

[15]  Larry E. Smith,et al.  Cultures, Contexts, and World Englishes , 2006 .

[16]  Braj B. Kachru Asian Englishes Beyond the Canon , 2005 .

[17]  Douglas Biber,et al.  A Corpus Linguistic Investigation of Vocabulary-based Discourse Units in University Registers , 2004 .

[18]  S. Gries Corpus linguistics and theoretical linguistics: A love–hate relationship? Not necessarily… , 2010 .

[19]  Max M. Louwerse,et al.  Multi-dimensional register classification using bigrams , 2007 .

[20]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[21]  Elena Tognini-Bonelli,et al.  Corpus Linguistics at Work , 2002, Computational Linguistics.

[22]  Bas Aarts,et al.  Exploring Natural Language: Working with the British Component of the International Corpus of English , 2002 .

[23]  Edgar W. Schneider,et al.  The Dynamics of New Englishes: From Identity Construction to Dialect Birth , 2003 .

[24]  Kenneth C. Hill International English: A guide to varieties of Standard English , 2004, Language in Society.

[25]  Geoffrey Sampson,et al.  Word frequency distributions , 2002, Computational Linguistics.

[26]  Stefan Th. Gries,et al.  Exploring variability within and between corpora: some methodological considerations , 2006 .

[27]  Stefan Th. Gries,et al.  N-grams and the clustering of genres , 2009 .

[28]  Sidney Greenbaum,et al.  Comparing English worldwide : the International Corpus of English , 1996 .

[29]  Colin Bannard,et al.  Stored Word Sequences in Language Learning , 2008, Psychological science.

[30]  Morten H. Christiansen,et al.  Processing of relative clauses is made easier by frequency of occurrence , 2007 .

[31]  A. Goldberg Constructions at Work: The Nature of Generalization in Language , 2006 .

[32]  Kingsley Bolton,et al.  English in Asia, Asian Englishes, and the issue of proficiency , 2008, English Today.

[33]  Joybrato Mukherjee,et al.  Ditransitive Verbs in Indian English and British English: A Corpus-linguistic Study. , 2007 .

[34]  Christian Mair British English/American English Grammar: Convergence in Writing – Divergence in Speech? , 2007 .

[35]  Stefan Th. Gries,et al.  Collostructions: Investigating the interaction of words and constructions , 2003 .

[36]  Eniko Csomay,et al.  Lexical bundle distribution in university classroom talk , 2010 .

[37]  Joybrato Mukherjee Corpus linguistics versus corpus dogmatism: pace Wolfgang Teubert , 2010 .

[38]  P. Trudgill,et al.  International English: A guide to varieties of standard English , 1985 .

[39]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[40]  Daniel Wiechmann On the computation of collostruction strength: Testing measures of association as expressions of lexical bias , 2008 .

[41]  R. Xiao Multidimensional analysis and the study of world Englishes , 2009 .

[42]  Rakesh Mohan Bhatt,et al.  World Englishes: The Study of New Linguistic Varieties , 2008 .

[43]  Stefan Th. Gries,et al.  Collostructional nativisation in New Englishes: Verb-construction associations in the International Corpus of English , 2009 .

[44]  Dan Jurafsky,et al.  Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. , 2003, The Journal of the Acoustical Society of America.

[45]  D. Biber,et al.  If you look at …: Lexical Bundles in University Teaching and Textbooks , 2004 .

[46]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[47]  Constantin Orasan,et al.  A corpus-based investigation of junk emails , 2002, LREC.

[48]  Stefan Th. Gries,et al.  Bigrams in registers, domains, and varieties: a bigram gravity approach to the homogeneity of corpora , 2009 .

[49]  Joybrato Mukherjee,et al.  Describing verb-complementational profiles of New Englishes: A pilot study of Indian English , 2006 .

[50]  David C. S. Li The Functions and Status of English in Hong Kong: A Post-1997 Update , 1999 .

[51]  Rajend Mesthrie,et al.  Africa, South and Southeast Asia , 2008 .

[52]  Braj B. Kachru English as an Asian Language , 1998 .