Spoken Corpus Design

This paper describes the approach to spoken corpus design used by the British National Corpus project. A two-part approach to spoken corpus design has been adopted. The demographic approach uses demographic parameters to sample the everyday speech of the population of British English speakers in the UK. The context governed approach is designed to cover the full range of linguistic variation found in spoken language using a typology based on four contextual categories. Details of the processing of recording are given, together with a description of the context features included in the corpus