“i didn’t spel that wrong did i. Oops”: Analysis and normalisation of SMS spelling variation

Spelling variation, although present in all varieties of English, is particularly prevalent in SMS text messaging. Researchers argue that spelling variants in SMSes are principled and meaningful, reflecting patterns of variation across historical and contemporary texts, and contributing to the performance of social identities. However, little attempt has yet been made to empirically validate SMS spelling patterns (for most languages, with the notable exception of French) and verify the extent to which they mirror those in other texts. This article reports on the use of the VARD2 tool to analyse and normalise the spelling variation in a corpus of over 11,000 SMSes collected in the UK between 2004 and 2007. A second tool, DICER, was used to examine the variant and equivalent mappings from the normalised corpus. The database of rules and frequencies enables comparison with other text types and the automatic normalisation of spelling in larger SMS corpora. As well as examining various spelling trends with the DICER analysis it was also possible to place the spelling variants found in the SMS corpus into functional categories; the ultimate aim being to create a taxonomy of SMS spelling. The article reports on the findings from this categorisation process, whilst also discussing the difficulty in choosing categories for some spelling variants.

[1]  Susan C. Herring,et al.  The Multilingual Internet: Language, Culture, and Communication Online , 2007 .

[2]  Naomi S. Baron,et al.  Text Messaging and IM , 2007 .

[3]  Mirko Luigi Aurelio Tavosanis,et al.  A Causal Classification of Orthography Errors in Web Texts , 2007 .

[4]  A. Baron,et al.  Word frequency and key word statistics in historical corpus linguistics , 2009 .

[5]  Jannis Androutsopoulos Non‐standard spellings in media texts: The case of German fanzines , 2000 .

[6]  Michael Stubbs,et al.  Spelling and society: The culture and politics of orthography around the world , 2009 .

[7]  Crispin Thurlow,et al.  From Statistical Panic to Moral Panic: The Metadiscursive Construction and Popular Exaggeration of New Media Language in the Print Media , 2006, J. Comput. Mediat. Commun..

[8]  Paul Rayson,et al.  Improving the precision of corpus methods: The standardized version of Early Modern English Medical Texts , 2010 .

[9]  David Crystal,et al.  Txtng: the Gr8 Db8 , 2008 .

[10]  Cédrick Fairon,et al.  A Hybrid Rule/Model-Based Finite-State Framework for Normalizing SMS Messages , 2010, ACL.

[11]  Kenneth Ward Church,et al.  Probability scoring for spelling correction , 1991 .

[12]  Terttu Nevalainen,et al.  An Introduction to Early Modern English , 2006 .

[13]  Ana Deumert,et al.  Mobile language choices — The use of English and isiXhosa in text messages (SMS): Evidence from a bilingual South African sample , 2008 .

[14]  C. Wood,et al.  Exploring the relationship between children's knowledge of text message abbreviations and school literacy outcomes. , 2009, The British journal of developmental psychology.

[15]  Eija-Liisa Kasesniemi,et al.  Mobile culture of children and teenagers in Finland , 2002 .

[16]  Paul Rayson,et al.  Automatic error tagging of spelling mistakes in learner corpora , 2011 .

[17]  Caroline Tagg,et al.  A corpus linguistics study of SMS text messaging , 2009 .

[18]  Rebecca E. Grinter,et al.  Wan2tlk?: everyday text messaging , 2003, CHI '03.

[19]  Louise Pound The Kraze for "K" , 1925 .

[20]  François Yvon,et al.  Rewriting the orthography of SMS messages , 2010, Natural Language Engineering.

[21]  Dawn Archer,et al.  Automatic Standardization of Spelling for Historical Text Mining , 2009 .

[22]  Vivian Cook,et al.  Accomodating Brocolli in the Cemetary: Or Why Can't Anybody Spell , 2005 .

[23]  Roger Mitton,et al.  Spelling checkers, spelling correctors and the misspellings of poor spellers , 1987, Inf. Process. Manag..

[24]  D. Crystal The Cambridge Encyclopedia of the English Language , 1998 .

[25]  Kenneth Fordyce,et al.  Discourse of Text Messaging: Analysis of SMS Communication , 2014 .

[26]  Rich Ling Mobile Communications vis-à-vis Teen Emancipation, Peer Group Integration and Deviance , 2005 .

[27]  Jennifer Pedler,et al.  A Large List of Confusion Sets for Spellchecking Assessed Against a Corpus of Real-word Errors , 2010, LREC.

[28]  Ylva Hård af Segerstad Use and Adaptation of Written Language to the Conditions of Computer-Mediated Communication , 2002 .

[29]  Roger Mitton Ordering the suggestions of a spellchecker without using context , 2009, Nat. Lang. Eng..

[30]  Alexander Bergs,et al.  Literacy and the new media: vita brevis, lingua brevis , 2004 .

[31]  Dawn Archer,et al.  Developing an automated semantic analysis system for Early Modern English , 2003 .

[32]  Caroline Tagg Discourse of Text Messaging: Analysis of SMS Communication , 2012 .

[33]  Dennis R. Preston Mowr and mowr bayud spellin': Confessions of a sociolinguist , 2000 .

[34]  Paul Rayson,et al.  Automatic standardisation of texts containing spelling variation: How much training data do you need? , 2009 .

[35]  Rose-Marie Weber Variations in Spelling and the Special Case of Colloquial Contractions. , 1986 .