The English Dialects App: The creation of a crowdsourced dialect corpus

In this paper, we present the English Dialects App (EDA) and the English Dialects App Corpus (EDAC). EDA is a free iOS and Android app, launched in January 2016 that features a dialect quiz and dialect recordings. For the quiz, users indicate which variants of 26 words they use and the application guesses their local dialect; for the recordings, users can record a short text. The result is EDAC which includes metadata on mobility, ethnicity, age, educational level, and gender. More than 47,000 users from across the UK have indicated dialect variants for these 26 words, and more than 3,500 users have provided audio recordings. Unavoidably, EDAC does not successfully reflect distributions of age, ethnicity, qualification levels, and other parameters found for the UK population given that smartphone-based research reaches a specific stratum of the population. Yet there are also clear benefits to the sampling strategy used – benefits and pitfalls are discussed in this article. Future analyses will provide the most comprehensive understanding of English regional dialect variation since the work of the traditional dialectologists. We showcase two such analyses in this article. EDAC should, we demonstrate, be of interest to researchers in dialectology but also in forensic phonetics.

[1]  Penelope Eckert,et al.  Elephants in the room , 2003 .

[2]  Maxine Eskénazi,et al.  Speaking to the Crowd: Looking at Past Achievements in Using Crowdsourcing for Speech and Predicting Future Challenges , 2011, INTERSPEECH.

[3]  Jon Barker,et al.  1 Crowdsourcing in Speech Perception , 2012 .

[4]  Jules Gilliéron,et al.  Atlas linguistique de la France , 1902 .

[5]  William Labov When Intuitions Fail , 2003 .

[6]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  O. Köster,et al.  The tell-tale accent: Identification of regionally marked speech in German telephone conversations by forensic phoneticians , 2012 .

[8]  John Nerbonne,et al.  Analyzing phonetic variation in the traditional English dialects: Simultaneously clustering dialects and phonetic features , 2013, Lit. Linguistic Comput..

[9]  J. Harrington,et al.  Does the Queen speak the Queen's English? , 2000, Nature.

[10]  Catherine Johns-Lewis Prosodic Differentiation of Discourse Modes , 2018, Intonation in Discourse.

[11]  Bruce Southard The Linguistic Atlas of England. Ed. Harold Orton, Stewart Sanderson, and John Widdowson. London: Croom Helm, 1978. Unpaginated , 1981 .

[12]  P. Loizou,et al.  The influence of noise on vowel and consonant cues. , 2005, The Journal of the Acoustical Society of America.

[13]  Raven I. McDavid,et al.  The Linguistic Atlas of England , 1981 .

[14]  D. Britain Between North and South: The Fenland , 2015 .

[15]  C Manfredi,et al.  Smartphones Offer New Opportunities in Clinical Voice Research. , 2017, Journal of voice : official journal of the Voice Foundation.

[16]  P. Trudgill The Social Differentiation of English in Norwich , 1974 .

[17]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[18]  Peter French,et al.  Position Statement concerning use of impressionistic likelihood terms in forensic speaker comparison cases, with a foreword by Peter French & Philip Harrison , 2007 .

[19]  Hans Kurath,et al.  The dialectal structure of southern England : phonological evidence , 1970 .

[20]  T. Champion Urban–Rural Differences in Commuting in England: A Challenge to the Rural Sustainability Agenda? , 2009 .

[21]  D. Britain,et al.  Crowdsourcing Language Change with Smartphone Applications , 2016, PloS one.

[22]  David Deterding,et al.  The North Wind versus a Wolf: short texts for the description and measurement of English pronunciation , 2006, Journal of the International Phonetic Association.

[23]  Michael Jessen,et al.  Forensic phonetics , 1991, Journal of Linguistics.

[24]  Jennifer Hay,et al.  Car-talk: Location-specific speech production and perception , 2017, J. Phonetics.

[25]  Florian Schiel,et al.  BAS Speech Science Web Services - an Update of Current Developments , 2016, LREC.

[26]  H. Orton,et al.  Survey of English dialects , 1962 .

[27]  Roma Chappell,et al.  Focus on people and migration , 2005 .

[28]  D. A. van Leeuwen,et al.  Sprekend Nederland: a heterogeneous speech data collection , 2016 .

[29]  Philip Harrison,et al.  F0 STATISTICS FOR 100 YOUNG MALE SPEAKERS OF STANDARD SOUTHERN BRITISH ENGLISH , 2007 .

[30]  Paul De Decker,et al.  University of Pennsylvania Working Papers in Linguistics for the Record: Which Digital Media Can Be Used for Sociophonetic Analysis? , 2022 .

[31]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[32]  Ulf-Dietrich Reips Standards for Internet-based experimenting. , 2002, Experimental psychology.

[33]  Martijn Wieling,et al.  Analyzing the BBC Voices data: Contemporary English dialect areas and their characteristic lexical variants , 2014, Lit. Linguistic Comput..

[34]  Adrian Leemann,et al.  Voice Äpp: a mobile app for crowdsourcing Swiss German dialect data , 2015, INTERSPEECH.

[35]  Susanne Wagner,et al.  FRED — The Freiburg English Dialect Corpus: Applying Corpus-Linguistic Research Tools to the Analysis of Dialect Data , 2007 .

[36]  Anke Lüdeling,et al.  Corpus linguistics and dialectology , 2009 .

[37]  Peter French,et al.  International practices in forensic speaker comparison , 2011 .

[38]  Warren Maguire,et al.  Mapping The Existing Phonology of English Dialects , 2012 .

[39]  Gitta P. M. Laan The contribution of intonation, segmental durations, and spectral features to the perception of a spontaneous and a read speaking style , 1997, Speech Commun..

[40]  Elizabeth Shriberg,et al.  Spontaneous speech: how people really talk and why engineers should care , 2005, INTERSPEECH.

[41]  David Britain,et al.  When is a change not a change? A case study on the dialect origins of New Zealand English , 2008, Language Variation and Change.

[42]  Adrian Leemann,et al.  Dialäkt Äpp: communicating dialectology to the public – crowdsourcing dialects from the public , 2015 .

[43]  Robert George Shackleton,et al.  Quantitative assessment of English-American speech relationships , 2010 .