CASS: a phonetically transcribed corpus of mandarin spontaneous speech

A collection of Chinese spoken language has been collected and phonetically annotated to capture spontaneous speech and language effects. The Chinese Annotated Spontaneous Speech (CASS) corpus contains phonetically transcribed spontaneous speech. This corpus was created to begin to collect samples of most of the phonetic variations in Mandarin spontaneous speech due to pronunciation effects, including allophonic changes, phoneme reduction, phoneme deletion and insertion, as well as duration changes. It is intended for use in pronunciation modeling for improved automatic speech recognition and will be used at the 2000 Johns Hopkins University Language Engineering Workshop by the project on Pronunciation Modeling of Mandarin Casual Speech.