USTC95-a Putonghua corpus
暂无分享,去创建一个
For the standard spoken Chinese dialect commonly known as Putonghua or Mandarin, a large corpus called USTC95 (University of Science and Technology of China, '95) is introduced, which is primarily designed to support research in Chinese speech recognition and analysis and in recognition system evaluation. This corpus consists of four major sub-corpora, corresponding to isolated syllables, multi-syllable words, sentences and telephone speech. With an elaborate design, the corpus encompasses all the phones and mono-syllables, as well as the co-articulation effects in Putonghua; also, it keeps as little redundancy as possible. This parsimonious corpus makes it possible to acquire acoustic-phonetic knowledge for isolated word recognition and continuous Chinese recognition, to provide speech data for training a telephone speech recognizer, and also to provide a common test base for the performance assessment of the recognizer.
[1] R Wang. ASSESSMENT OF CHINESE SPEECH INPUT SYSTEMS , 1994 .
[2] Shigeru Katagiri,et al. ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..