SPCS Speech Corpus

Broadband speech corpus of approximately 10 hours and the corresponding transcriptions. The development process of the corpus involved the recording and transcribing of radio broadcasts. The transcriptions were used to generate the Sepedi code-switched prompts to re-record speech from multiple speakers. The following sub-directories are found in this directory: Audio: Audio files for all the recorded code-switched speech Transcriptions: The corresponding orthographic transcriptions Metadata: Information about the speakers and the transcriptions Documentation: The directory structure and the Sepedi prompt list