Introduction:compiling and analysing the Spoken British National Corpus 2014

For over twenty years, the British National Corpus has been one of the most widely known and used corpora. It is almost impossible to attend an international corpus linguistics conference such as Corpus Linguistics, ICAME (International Computer Archive of Modern and Medieval English), AACL (American Association for Corpus Linguistics) or APCLC (Asia Pacific Corpus Linguistics Conference) without encountering several papers which in some way employ the BNC. Focusing on the 10-million-word spoken component of the BNC, Love et al. (this issue) show that no other orthographically transcribed spoken corpus compiled since the release of the BNC has matched the Spoken BNC in either its size or availability. Unsurprisingly, the corpus linguistics community has, for some time, used the Spoken BNC as a proxy for “present-day” spoken British English. That the “go-to” dataset is over twenty years old, as Love et al. (this issue) argue, is a problem for current and future research that needs to be addressed with increasing urgency. The collaboration between Cambridge University Press (CUP) and the ESRC Centre for Corpus Approaches to Social Science (CASS) 1 at Lancaster University to build the Spoken BNC2014 came about after some years of both centres working individually on the idea of addressing this situation by compiling a new corpus of spoken British English which could, in some way, match up to the Spoken BNC. 2 Claire Dembry at CUP had collected two million words of new spoken data for the Cambridge English Corpus between 2012 and 2014, trialling the public participation method which was retained, along with the data itself, in the Spoken BNC2014